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We  describe  a  planning  algorithm,  NDP2,  that  finds  strong-cyclic  solutions  to  nondetermin¬ 
istic  planning  problems  by  using  a  classical  planner  to  solve  a  sequence  of  classical 
planning  problems.  NDP2  is  provably  correct,  and  fixes  several  problems  with  prior  work. 
We  also  describe  two  preprocessing  algorithms  that  can  provide  a  restricted  version  of 
the  symbolic  abstraction  capabilities  of  the  well-known  MBP  planner.  The  preprocessing 
algorithms  accomplish  this  by  rewriting  the  planning  problems,  hence  do  not  require  any 
modifications  to  NDP2  or  its  classical  planner. 

In  our  experimental  comparisons  of  NDP2  (using  FF  as  the  classical  planner)  to  MBP  in  six 
different  planning  domains,  each  planner  outperformed  the  other  in  some  domains  but  not 
others.  Which  planner  did  better  depended  on  three  things:  the  amount  of  nondeterminism 
in  the  planning  domain,  domain  characteristics  that  affected  how  well  the  abstraction 
techniques  worked,  and  whether  the  domain  contained  unsolvable  states. 

©  2014  Elsevier  B.V.  All  rights  reserved. 


1.  Introduction 

This  paper  is  about  a  way  to  use  classical  planners  to  solve  nondeterministic  planning  problems.  Given  a  nondeterministic 
planning  problem  P  and  any  classical  planner  CP,  our  NDP2  algorithm  calls  CP  on  a  sequence  of  classical  planning  problems 
derived  from  P,  and  uses  CP’s  solutions  to  construct  a  strong-cyclic  solution  for  P.  NDP2  is  based  on  the  NDP  algorithm  [30], 
but  overcomes  several  problems  with  that  prior  work.  Our  contributions  are  as  follows: 

•  NDP2  corrects  two  problems  that  NDP  had  with  unsolvable  states.  Although  NDP's  pseudocode  included  a  way  to  deal 
with  unsolvable  states  by  making  modifications  to  the  planning  domain,  its  authors  did  not  implement  this  part  of  NDP, 
and  did  not  realize  that  it  has  two  significant  problems:  (1)  when  it  encounters  unsolvable  states,  NDP  modifies  the 
domain  model  in  a  way  that  can  make  it  exponentially  larger,  and  (2)  there  are  cases  in  which  unsolvable  states  will 
cause  NDP  to  generate  incorrect  solutions. 

NDP2  overcomes  both  of  these  problems.  If  NDP2  is  used  with  a  classical  planner  CP  that  is  sound,  complete,  and  guar¬ 
anteed  to  terminate  on  classical  planning  problems,  then  NDP2  will  be  sound,  complete,  and  guaranteed  to  terminate 
on  nondeterministic  planning  problems. 
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•  Planners  such  as  MBP  [9],  POND  [7],  and  Yoyo  [28]  do  not  represent  states  in  the  usual  classical  fashion,  but  instead  use 
binary  decision  diagrams  (BDDs)  [8]  to  represent  sets  of  states  that  have  common  properties.  This  provides  substantial 
efficiency  gains,  by  enabling  those  planners  to  plan  for  large  sets  of  states  at  once.  To  provide  a  limited  form  of  MBP’s 
state-abstraction  capability  within  an  ordinary  classical  state  representation,  the  NDP  paper  [30]  described  a  technique 
called  “conjunctive  abstraction”  that  involved  modifying  the  planning  domain  to  include  “abstract  states”  that  represent 
sets  of  ordinary  states.  Although  [30  provided  examples  of  such  abstractions,  it  did  not  include  an  algorithm  to  produce 
them. 

We  provide  preprocessing  algorithms  to  make  two  kinds  of  planning-domain  modifications  similar  to  conjunctive  ab¬ 
straction.  When  used  as  preprocessors  to  NDP2,  these  algorithms  can  sometimes  provide  state-abstraction  abilities 
analogous  to  MBP’s,  and  they  preserve  NDP2’s  ability  to  be  used  with  any  classical  planner. 

•  We  provide  the  results  of  experimental  comparisons  of  NDP2  (using  FF  [21]  as  the  classical  planner)  with  MBP,  on  more 
than  4800  planning  problems  in  six  nondeterministic  planning  domains.  Unlike  [30],  in  which  the  experimental  tests 
were  limited  to  planning  domains  in  which  all  states  were  solvable,  three  of  our  six  domains  include  unsolvable  states. 
Our  experiments  showed  NDP2  outperforming  MBP  in  some  planning  domains,  and  MBP  outperforming  NDP2  in  others. 
Which  algorithm  performed  better  depended  mainly  on  (1)  the  amount  of  nondeterminism  in  the  search  space,  (2)  how 
well  the  nondeterminism  could  be  abstracted  out  (either  by  using  our  abstraction  algorithms  with  NDP2,  or  by  MBP 
using  its  BDDs),  and  (3)  whether  some  of  the  nondeterministic  outcomes  could  lead  to  unsolvable  states. 

This  paper  is  organized  as  follows.  Section  2  provides  definitions  and  notation.  Section  3  gives  an  algorithm  for  the  case 
where  all  states  are  solvable,  Section  4  extends  the  algorithm  to  deal  with  unsolvable  states,  and  Section  5  motivates 
and  describes  our  abstraction  formalisms  and  algorithms.  Section  6  provides  the  results  of  the  experimental  evaluations, 
Section  7  is  a  discussion  of  related  work,  and  Section  8  is  the  conclusion.  Appendix  A  contains  the  correctness  proofs 
for  NDP2,  Appendix  B  describes  techniques  for  translating  solution  policies  from  abstract  to  non-abstract  domains,  and 
Appendix  C  describes  a  case  where  NDP  [30]  is  unsound. 

2.  Basic  definitions  and  notation 

Below,  Sections  2.1  and  2.2  give  definitions  and  notation  for  nondeterministic  planning  domains  and  classical  planning, 
and  Section  2.3  defines  determinizations  of  nondeterministic  domains. 

2.1.  Nondeterministic  planning  domains 

A  nondeterministic  planning  domain  is  one  in  which  each  action  may  have  more  than  one  possible  outcome.  Formally,  it  is 
a  pair  D  =  (£,  O),  where  C  is  a  function-free  first-order  language  with  finitely  many  constant  symbols  (hence  finitely  many 
ground  atoms),  and  O  is  a  finite  set  of  nondeterministic  planning  operators  as  defined  below. 

We  will  represent  states  in  the  usual  classical  fashion:  if  F  =  [all  ground  atoms  of  £},  then  a  state  is  a  subset  of  F,  and 
the  set  of  all  possible  states  is  S  =  2f .  A  literal  /  is  true  in  s  if  /  is  a  non-negated  atom  and  /  e  s,  or  if  /  is  a  negated  atom 
->Qf  and  a  <£  s;  otherwise  /  is  false  in  s. 

Each  operator  o  e  O  is  a  pair 

o  =  (pre(o),  effects(o)), 

where  pre(o)  is  a  conjunction  of  literals  called  o’s  preconditions,  and  effects(o)  is  a  set  of  conjunctions  of  literals  called  o’s 
possible  effects.  Intuitively,  pre(o)  describes  what  must  be  true  in  order  to  use  o,  and  each  conjunction  in  effects(o)  describes 

one  of  the  possible  outcomes  of  using  o.  We  sometimes  will  refer  to  o  as  o(xi . xn),  where  xi . x„  are  the  variable 

symbols  in  o  in  some  canonical  order. 

An  action  a  is  a  ground  instance  of  an  operator  o,  and  pre(a)  and  effects(a)  are  the  corresponding  ground  instances  of 
pre(o)  and  effects(o).  If  a  is  the  action  produced  by  replacing  the  variables  in  o(Xi,...,xn)  with  constants  ci,...,cn,  then 
we  will  sometimes  refer  to  a  as  o(ci, . . . ,  cn).  We  will  use  A  to  denote  the  finite  set  of  all  possible  actions,  i.e„  all  possible 
ground  instances  of  the  operators  in  0.  An  action  a  is  executable  in  any  state  that  satisfies  pre(a).  For  each  state  s,  A(s)  c  A 
is  the  set  of  all  actions  that  are  executable  in  s. 

Let  a  e  A(s),  and  let  ei, . . . ,  e„  be  the  conjunctions  in  effects(a).  For  i  —  1, . . .  ,n,  let  y(s,  e,-)  =  (s  —  e~)  U  ,  where  e+ 
and  e~  are  the  sets  of  atoms  that  appear  positively  and  negatively  in  e,-.  Then  the  result  of  executing  a  in  s  is  the  following 
set  of  states1 : 

V  (s>  a)  =  { Y  (s,  ei ) } "=1  =  { (s  -  er)  u  e+  }"=1 . 

A  policy  is  a  function  n  that  maps  some  of  the  states  into  actions,  i.e„  n  :  S  ->  A  for  some  set  of  states  S  c  S.  For  each 
state-action  pair  (s,a)  e  it,  the  intended  meaning  is  that  a  is  the  action  to  perform  in  s.  A  hyperpolicy  is  a  function  it*  that 


1  When  necessary  to  avoid  ambiguity,  we  will  write  yD  to  refer  to  the  value  of  y  in  the  planning  domain  D. 
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maps  sets  of  states  into  actions,  i.e.,  n*  :  S  — >-  A,  for  some  set  S  c  2s .  For  each  pair  ( S ,  a)  e  jz*,  the  intended  meaning  is  that 
a  is  the  action  to  perform  in  every  state  s  e  S  (hence  there  is  ambiguity  about  what  action  to  perform  if  s  is  in  more  than 
one  S  e  S).  In  the  published  literature  on  planning  in  nondeterministic  environments,  the  solutions  to  planning  problems 
are  defined  to  be  policies— but  for  purposes  of  computational  efficiency,  most  of  the  better-known  planning  algorithms  (e.g., 
[37,9,28,7])  reason  instead  about  hyperpolicies,  using  Binary  Decision  Diagrams  (BDDs)  to  represent  the  sets  in  S. 

The  tt  -descendants  of  a  state  s  are  defined  recursively  as  follows: 

•  s  is  a  it -descendant  of  itself. 

•  If  s'  is  a  7T -descendant  of  s  and  7T(s')  is  defined,  then  every  s"  e  y(s',jz(s'))  is  also  a  jz -descendant  of  s. 

A  iz-result  of  s  is  any  ^-descendant  s'  of  s  for  which  jz(s')  is  not  defined  (the  intuition  is  that  if  we  execute  i z  starting  at  s 
and  end  up  at  s',  then  execution  will  cease).  Thus  we  can  define  y(s,  jz)  =  {s'  |  s'  is  a  jz -result  of  s}.  Note  that  as  a  special 
case,  if  t r  =  0  then  y(s,  jz)  =  {s}.  By  extension,  a  jr-result  of  a  set  of  states  S  is  any  state  that  is  a  tt -result  of  at  least  one 
of  the  states  in  S. 

A  nondeterministic  planning  problem  is  a  triple  P  =  (D.  So,  G),  where  D  =  (£,  0)  is  a  nondeterministic  planning  domain. 
So  c  5  is  a  set  of  initial  states,  and  G  c  S  is  a  set  of  goal  states.  P  may  have  different  kinds  of  solutions  [9,16]: 

•  A  weak  solution  must  provide  a  possibility  of  reaching  a  goal  state,  but  doesn’t  need  to  guarantee  that  a  goal  state  will 
always  be  reached.  More  specifically,  a  policy  jz  is  a  weak  solution  if  for  every  s  e  So,  some  goal  state  sg  e  G  is  a 
jz -result  of  s. 

•  A  strong  cyclic  solution  is  a  policy  jz  that  has  the  following  property:  for  every  state  s  that  is  a  ;r -descendant  of  So,  there 
is  a  goal  state  sg  e  G  that  is  a  7T -result  of  s.  Such  a  policy  is  guaranteed  to  reach  a  goal  state  in  every  fair  execution, 
i.e.,  every  execution  that  doesn’t  remain  in  a  cycle  forever  if  there’s  a  possibility  of  leaving  the  cycle. 

•  P  may  also  have  strong  solutions  [9,16],  but  we  will  not  need  that  definition  in  this  paper. 

A  state  s  e  S  is  weakly  solvable  if  the  planning  problem  (D,  {s} ,  G)  has  at  least  one  weak  solution,  and  strong  cyclically  solvable 
if  (D,  {s},  G)  has  at  least  one  strong  cyclic  solution.  Otherwise  s  is  unsolvable. 

If  every  state  that  is  reachable  from  So  is  weakly  solvable,  then  P  is  everywhere  weakly  solvable.  Similarly,  if  every  state 
that  is  reachable  from  So  is  strong-cyclically  solvable,  then  P  is  everywhere  strong-cyclically  solvable.  The  following  lemma 
(the  proof  is  in  Appendix  A)  shows  that  the  two  terms  are  equivalent,  so  we  will  just  say  everywhere  solvable  instead. 

Lemma  1.  A  nondeterministic  planning  problem  P  =  (D.  So,  G)  is  everywhere  weakly  solvable  iff  it  is  everywhere  strong  cyclically 
solvable. 

2.2.  Classical  planning  domains 

An  operator  or  action  o  is  classical  (or  deterministic )  if  effects(o)  contains  just  one  conjunction  of  literals.  A  planning 
domain  D  =  (C.  0)  is  classical  if  every  operator  in  0  is  classical.  A  planning  problem  P  =  (D,  So,  G)  is  classical  if  D  is 
classical  and  there  is  just  one  initial  state,  i.e.,  So  =  {so}  for  some  so  e  S.  In  this  case  we  will  dispense  with  So  and  write 
P  =  (D,s0,G). 

For  classical  planning  problems,  solutions  are  conventionally  defined  to  be  sequential  plans  rather  than  policies.  Formally, 

a  plan  is  a  sequence  p  =  (a \ . a ic)  of  classical  actions.  Given  a  state  so,  if  there  are  states  si _ _ s/;,  such  that  for  1  <  i  <  k, 

]/(Sj_i.  a*)  =  {s,-},  then  p  is  executable  in  so  and  y(so,  p)  =  s/(.  Given  a  planning  problem  P  =  (D,  so,  G),  a  state  s  is  solvable 
if  there  is  a  plan  p  such  that  y(s.  p)  e  G.  If  sq  is  solvable  then  we  say  that  P  itself  is  solvable.  If  every  state  that  is  reachable 
from  so  is  solvable,  then  P  is  everywhere-solvable. 

If  a  plan  p  =  (aj,...,a/<)  is  executable  at  a  state  so,  then  p  is  acyclic  at  so  if  each  state  so,...,Sfc  produced  by  ex¬ 
ecuting  p  is  unique  (the  plan  does  not  traverse  the  same  state  twice).  In  this  case,  p  corresponds  to  a  unique  policy 
jz  =  {(so,  cm),  (si,  02), . . . ,  (s/(_i,  Qfc)}  that  we  will  call  p’s  policy  image  at  so. 

2.3.  Determinizations  of  nondeterministic  domains 

If  o  =  (pre(o),  effects(o))  is  a  nondeterministic  operator  and  effects(o)  =  {ei, . . . ,  enj,  then  the  determinization  of  o  is  a  set 
o  of  deterministic  operators,  one  for  each  of  o’ s  possible  effects: 

o  =  { (pre(o),  ei), (pre(o),  e2), ..., (pre(o), e„)}. 

If  an  action  a  is  a  ground  instance  of  o,  then  its  determinization  a  —  [a\, . . .  ,an}  is  defined  similarly.  The  determinization 
of  a  nondeterministic  planning  domain  D  =  (£,  0)  is  a  classical  planning  domain  D  =  (£,  0),  where  0  =  |Joe0o.  The 
determinization  of  a  nondeterministic  planning  problem  P  =  (D.  {so},  G)  is  a  classical  planning  problem  P  =  (D,  s o,  G). 

Lemma  2.  For  every  state  s  in  a  nondeterministic  planning  problem  P,  s  is  weakly  solvable  in  P  if  and  only  if  it  is  solvable  in  P. 
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Algorithm  1:  NDPR,  a  planner  for  nondeterministic  planning  problems  that 
planning  problem,  and  CP  is  the  classical  planner. 

are  everywhere-solvable.  (D,  So,  G)  is  the 

1 

Procedure  NDPR(D,  So,  G,  CP) 

2 

tt  < —  0;  D  <-  a  determinization  of  D 

3 

loop 

4 

S  {all  non-goal  tt -results  of  So) 

5 

if  S  =  0  then 

6 

return  tt 

7 

arbitrarily  select  a  state  s  e  S 

8 

if  CP(D,  s,  G)  returns  a  solution  plan  (di , then 

9 

Let  ai , . . . ,  ajt  be  the  nondeterministic  versions  of  aj , . . . , 

10 

for  i  =  1 , . . . ,  k  do 

11 

if  tt  (s)  is  defined  then  remove  (s,  7T(s))  from  tt 

12 

insert  (s,  a,)  into  tt 

13 

s^yB(s,ai) 

14 

if  a  goal  state  is  a  tt  -descendant  of  s  then  break 

15 

else 

//  CP  didn't  find  a  solution,  so  either  CP  is  incomplete  or 
everywhere-solvable . 

the  planning  problem  is  not 

16 

return  Failure 

The  lemma  is  proved  in  Appendix  A.  From  the  lemma,  it  follows  immediately  that  if  a  nondeterministic  planning  problem 
P  =  (D.so,  G)  is  everywhere-solvable,  then  its  determinization  P  =  (D,  {so},  G)  also  is  everywhere-solvable. 

3.  Algorithm  for  everywhere-solvable  planning  problems 

The  clearest  way  to  describe  NDP2  is  to  start  with  an  algorithm  for  a  special  case:  planning  problems  that  are 
everywhere-solvable.  This  section  presents  that  algorithm;  and  in  Section  4  we  will  extend  the  algorithm  to  deal  correctly 
with  unsolvable  states. 

Algorithm  1,  NDPR,  takes  as  input  a  nondeterministic  planning  problem  P  =  (D,  So,  G)  and  a  classical  planner  CP.  NDPR 
works  by  calling  CP  on  problems  of  the  form  (D,s,  G),  and  combining  CP’s  solutions  into  a  solution  for  P.  It  is  nearly 
identical  to  the  NDP  algorithm  in  [301,  except  that  it  omits  NDP’s  faulty  pseudocode  for  unsolvable  states  and  it  specifies 
exactly  how  to  incorporate  a  plan  into  the  policy  (NDP  left  it  unstated). 

NDPR  first  initializes  tt  to  be  the  empty  policy,  and  generates  the  determinization  D  of  D  (Line  2).  Line  3  begins  the 
main  planning  loop.  If  every  7r -result  of  So  is  a  goal  state,  then  tt  is  a  strong  cyclic  solution,  so  NDPR  returns  it  (Line  6). 
Otherwise,  NDPR  selects  a  tt -result  s  of  So  that  is  not  a  goal  state,  and  uses  CP  to  search  for  a  plan  that  solves  s  in  D. 
If  CP  is  incomplete  or  if  P  is  not  everywhere-solvable,  then  CP  may  fail  to  find  a  solution  plan  for  (D,s,  G);  and  in  this 
case  NDPR  returns  failure  (Line  16).  But  if  CP  returns  a  solution  plan  p,  then  NDPR  incorporates  the  actions  of  p  into  tt 
(Lines  10-14)  one  at  a  time,  stopping  if  it  finds  a  state  that  tt  already  weakly  solves.  Note  that  if  CP  is  guaranteed  to  return 
acyclic  solutions,  then  Line  11  can  be  omitted  and  the  condition  in  Line  14  can  be  replaced  with  a  check  to  see  if  n(s)  is 
defined. 

Example.  To  illustrate  how  NDPR  works,  let  D  and  D  be  the  nondeterministic  domain  and  its  determinization  as  shown  in 
Fig.  1  and  Fig.  2.  Consider  the  nondeterministic  planning  problem  P  =  (D,  {so},  fe}),  in  which  the  set  of  initial  states  is  {so} 
and  there  is  a  single  goal  state,  S2.  In  NDPR’s  first  iteration,  NDPR  calls  the  classical  planner  on  the  problem  (D.so.fe}). 
Suppose  that  the  classical  planner  returns  the  plan  (012).  NDPR  will  incorporate  this  plan  into  the  currently  empty  policy  tt 
(Lines  10-14).  As  a  result,  si  is  now  a  non-goal  tt -result  of  {so}. 

In  NDPR’s  second  iteration,  it  will  call  the  classical  planner  on  (D,  Si,  {S2}).  Suppose  the  planner  returns  the  plan  (a2.au)- 
NDPR  will  incorporate  the  first  action  02,  but  then  stop  incorporating  the  plan  at  Line  14  since  sq  already  has  a  weak 
solution.  There  are  now  no  non-goal  jr -result  of  {so}  (the  intuition  is  that  all  of  the  :r -results  of  {so}  are  goal  states),  and 
NDPR  will  exit  on  the  next  iteration  (Line  6).  □ 

4.  Dealing  with  unsolvable  states 

Kuter  et  al.  [30]  described  a  way  for  NDP  to  deal  with  unsolvable  states  by  removing  state-action  pairs  from  the  domain. 
If  CP  returned  failure  on  a  state  s,  the  idea  was  to  take  every  state  s'  and  action  a  such  that  7 r(s')  =a  and  s  e  y(s',a) 
and  modify  the  definition  of  a  to  make  it  inapplicable  in  s'.  This  requires  modifying  a’s  precondition  to  exclude  s'  without 
excluding  any  other  states.  Such  a  precondition  will  be  a  large  disjunction  that  includes  a  positive  or  negative  literal  for 
every  ground  atom  in  the  planning  domain,  and  the  number  of  ground  atoms  is  often  exponential  in  the  size  of  the  domain 
description.  Thus  NDP’s  way  of  dealing  with  unsolvable  states  often  increases  the  size  of  the  domain  description— and  the 
computational  overhead  of  evaluating  action  preconditions— by  an  exponential  amount. 

In  Section  4.1  we  present  ConstrainProblem,  a  procedure  for  modifying  a  classical  planning  problem  P  —  (D,  s,  G) 
to  make  some  of  the  actions  inapplicable  at  the  first  step  of  any  solution  to  P.  Unlike  removing  state-action  pairs, 
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Fig.  1.  Graphic  depiction  of  a  nondeterministic  planning  domain.  Circles  represent  states,  hyperedges  represent  actions. 


Fig.  2.  Determinization  of  the  planning  domain  in  Fig.  1.  The  determinization  of  ai  is  {an.aij}. 


Algorithm  2:  ConstrainProblem  takes  a  planning  problem  (D.s,  G)  and  a  set  of  actions  A,  and  returns  a  new  planning 
problem  P  that  has  the  same  solutions  minus  the  set  of  plans  that  start  with  an  action  in  A. 

1  Procedure  ConstrainProblemfD,  s,  G,  A) 

2  D'  <-  D ;  s'  <- s 

3  foreach  action  o(c, , . . . ,  c„)  e  A  do 

4  s'  <-  {disallowed,, (ci, . . .  ,c„))  Us' 

5  foreach  operator  o(X| .....  xn)  e  D'  do 

6  pre(o)  (-■disallowedolxi, . . .  ,x„))  A  pre(o) 

7  foreach  action a(ci ,...,c„)e  A  do 

8  effect(o)  <-  {-’disalloweda(c1 ..... c„))  U  effect(o) 

9  return  (O',  s',  C) 


ConstrainProblem  only  incurs  a  quadratic  increase  in  the  size  of  the  domain  description  per  constrained  action.  In  Sec¬ 
tion  4.2  we  present  Find-Acceptable-Plan,  a  procedure  that  uses  ConstrainProblem  to  search  for  an  acyclic  plan  whose 
policy  image  avoids  known  unsolvable  states,  in  Section  4.3  we  present  NDP2,  a  modified  version  of  NDPR  (see  Section  3) 
that  uses  Find-Acceptable-Plan  to  avoid  with  known  unsolvable  states. 

4.1.  Restricting  which  actions  are  available 

Algorithm  2  is  the  ConstrainProblem  procedure,  which  takes  a  classical  planning  problem  (D.s,  G)  and  a  set  A  of  actions, 
and  returns  a  new  planning  problem  (D',  s',  G)  for  which  a  solution  is  any  solution  to  (D,  s,  G)  that  does  not  start  with  an 
action  in  A. 

For  each  operator  o(X\, . . . ,xn)  e  0,  we  introduce  a  new  predicate  disallowed0(tj, . . . ,  tn).  ConstrainProblem  begins  by 
creating  a  new  state  s'  that  is  identical  to  s  except  that  for  each  action  o(Ci, . . . ,  c„)  e  A,  s'  contains  a  new  atom  of  the  form 
disallowedo  (ci , . . . ,  c„).  Then,  for  each  operator  oD'  with  variables  x\, ..  .,xn,  ConstrainProblem  adds  -’disallowed,,  (xi , . . . ,  x„) 
to  o’s  preconditions  (Line  6).  This  prevents  any  grounding  of  o  with  the  constants  cj .....  c„  from  being  applicable  whenever 
disallowed0 (ci, . . . ,  c„)  is  true. 

Finally,  to  the  effects  of  each  action,  ConstrainProblem  adds  the  negation  of  the  disallowed  predicates  that  it  added  to  the 
initial  state  (Line  8).  This  ensures  that  -’disallowedo (.. .)  always  holds  after  applying  any  action  to  the  initial  state. 

4.2.  Avoiding  known-unsolvable  states 

We  use  ConstrainProblem  in  Find-Acceptable-Plan  (Algorithm  3),  which  is  used  to  construct  acyclic  plans  whose  nonde¬ 
terministic  images  avoid  known  unsolvable  states.  Find-Acceptable-Plan’s  parameters  consist  of  a  nondeterministic  planning 
domain  D,  its  determinization  D,  an  initial  state  so,  a  set  of  goal  states  G,  a  classical  planner  CP,  and  a  set  U  of  states  to 
avoid. 

In  Line  2,  Find-Acceptable-Plan  initializes  five  variables  that  it  will  maintain  throughout  its  search:  p  is  the  current  plan, 
S  is  a  list  of  states  associated  with  p,  s  is  the  last  state  in  S,  and  B  is  a  mapping  from  states  to  sets  of  actions  known  to 
lead  to  cycles  or  unsolvable  states,  and  K  is  a  set  of  states  which  can’t  be  part  of  any  solution. 
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Algorithm  3:  Find-Acceptable-Plan  takes  a  classical  planning  problem,  classical  planner,  and  a  set  U  of  states  to  avoid. 
It  returns  a  plan  for  which  no  action’s  nondeterministic  version  can  go  to  a  state  in  U. 

1  Procedure  Find-Acceptable-Plan(D,  D,  so,  G,  CP,  U) 

2  s  -(-so:  p  <-  ();  S  «-  (so);  B  <- 0;  K  <-  U 

3  loop 

4  if  s  €  G  then  return  p 

5  A  «-  {action  b(ci , . . . ,  cn)  such  that  (s,  b(ci , . . . ,  c„))  €  6} 

6  P'  <-  ConstrainProblem(D,  s,  G,  A) 

7  call  CP  on  the  planning  problem  P' 

8  if  CP  returns  a  solution  plan  (di , . . . ,  d^)  then 

9  foreach  i  =  1 , . . . ,  k  do 

10  a,  the  nondeterministic  action  in  D  that  corresponds  to  ai 

11  s'<-y5(s,dj) 

12  if  s'  e  S  U  I<  or  yo  (s,  a,)  HU  ^0  then 

13  B  B  U  {(s,  a7)} 

14  break 

15  append  dj  to  p  and  append  s'  to  S;  s  <-  s' 

16  else  CP  returns  Failure 

17  a'  <-  last  element  of  p 

18  remove  last  elements  of  p  and  S 

19  if  S  =  { )  then  return  Failure 

20  s'  <—  last  element  of  S 

21  B  ^BJ{(s',a')};  K  <- KU{sj 

22  s  —  s' 


In  Lines  3-22,  Find-Acceptable-Plan  repeatedly  calls  CP  to  try  and  extend  p  towards  a  goal  state  without  causing  a  cycle 
or  choosing  an  action  whose  nondeterministic  version  leads  to  a  state  in  U.  In  Line  4,  Find-Acceptable-Plan  checks  if  s,  the 
last  state  of  p,  is  a  goal  state  and  returns  p  if  it  is. 

Otherwise,  Find-Acceptable-Plan  calls  CP  to  generate  a  plan  from  s  to  a  goal  state.  This  requires  overcoming  two  po¬ 
tential  problems:  (1)  if  the  plan  p  generated  by  CP  contains  a  cycle,  then  p  cannot  be  translated  into  a  policy  because 
it  will  require  two  different  actions  at  one  of  its  states,  and  (2)  if  p  goes  through  a  state  in  Li,  then  it  cannot  be  trans¬ 
lated  into  a  policy  that  solves  the  nondeterministic  problem  (D,  {s},  G),  since  the  states  in  U  are  known  to  be  unsolvable. 
Find-Acceptable-Plan  makes  progress  by  ensuring  that  CP  never  returns  a  plan  that  starts  with  an  action  it  has  seen  before. 
First,  it  finds  the  set  of  actions  in  B  associated  with  the  current  state  that  are  known  to  cause  loops  or  lead  to  states  in 
Li  (Line  5).  Find-Acceptable-Plan  then  takes  the  classical  problem  P  —  (D,  s,  G),  and  uses  ConstrainProblem  to  create  a  new 
planning  problem  P'  for  which  these  actions  cannot  appear  in  the  first  step  of  a  solution  (Line  6).  Find-Acceptable-Plan  then 
calls  the  classical  planner  CP  on  this  modified  problem  (Line  7).  If  CP  is  sound  and  complete,  there  are  two  cases: 

Case  1:  CP  returns  a  plan  q  =  (cq . cqf)  that  leads  to  a  goal  state.  Then  in  Lines  9  through  15,  Find-Acceptable-Plan 

iterates  over  the  actions,  adding  them  to  the  current  plan  p  and  updating  the  current  state  s.  If  a)  leads  to  a  state 
already  seen  in  p  or  to  a  known-unsolvable  state  in  K,  or  if  its  nondeterministic  counterpart  a,  leads  to  a  state  in  U , 
then  Find-Acceptable-Plan  inserts  (s.  ai)  into  B  and  stops  integrating  q  (without  removing  already  integrated  actions). 
Planning  will  restart  at  the  state  just  before  a,-.  If  the  plan  is  fully  integrated,  then,  assuming  CP  is  sound,  s  is  a  goal 
state,  and  Find-Acceptable-Plan  will  return  p  on  the  next  iteration  of  the  main  loop. 

Case  2:  CP  cannot  find  a  plan,  and  returns  failure.  Then  Find-Acceptable-Plan  backtracks  to  the  previous  state  s'  in  S  and 
adds  the  state-action  (s',  a)  pair  leading  to  s  to  B,  and  adds  s  to  the  set  of  known-unsolvable  states  K  (Line  21).  If  there 
is  no  previous  state,  then  this  means  that  there  is  no  plan  starting  from  sq  and  reaching  a  goal  state  whose  policy 
image  doesn’t  lead  to  U,  so  Find-Acceptable-Plan  returns  Failure  (Line  19). 

Example.  Let  D  and  D  be  as  in  Fig.  1  and  Fig.  2,  and  consider  the  call  to  Find-Acceptable-Plan(D,  D,sq,  {si },  CP,  {S2}).  The 
current  state  s  will  be  set  to  so,  the  list  of  states  S  set  to  (so). 

With  B  initially  empty,  the  call  to  ConstrainProblem(D,  so,  {si })  will  return  an  identical  classical  problem  P'  — 
(D,  s0.  {st})  (Fig.  3).  When  Find-Acceptable-Plan  calls  CP  on  P',  it  will  return  the  plan:  (an).  Since  the  nondeterministic 
action  corresponding  to  an  leads  to  s 2  (which  is  in  U),  Find-Acceptable-Plan  will  break  before  incorporating  the  first  action 
of  the  plan,  and  add  the  pair  (so.an)  to  B. 

On  the  next  iteration  of  Find-Acceptable-Plan,  an  will  be  in  the  set  A  of  actions  to  constrain.  ConstrainProblem(D,  so, 
{si } ,  {an})  will  return  a  classical  problem  (D',s'0,  {si }),  with  a  new  initial  state  sj,  that  has  the  same  set  of  applicable  actions 
as  so,  except  for  an  (Fig.  4).  Since  this  problem  is  unsolvable,  CP  returns  failure.  Hence  Find-Acceptable-Plan  also  returns 
failure.  □ 

Theorem  1.  Let  CP  be  a  sound  and  complete  classical  planner,  U  be  a  set  of  states,  D  be  a  nondeterministic  planning  domain,  and 
D  —  (C,  0)  be  the  determinization  of  D.  Let  S  be  the  set  of  all  states  in  C  (i.e.,  S  =  2F),  F  =  fall  ground  atoms  over  £),  and  A  be  the 
set  of  all  possible  actions  (i.e.,  all  possible  ground  instantiations  of  the  planning  operators  in  0  ). 
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Fig. 3.  ConstrainProblem(D, s0, S], 0),  identical  to  the  determinization  shown  in  Fig.  2. 


Fig. 4.  ConstrainProblem(D, so, si ,  (an)),  with  a  new  initial  state  s'n. 


Algorithm  4:  NDP2  is  a  modified  version  of  NDPR  that  works  correctly 
call  to  CP  in  Line  8  is  replaced  with  a  call  to  Algorithm  3,  which  uses 
that  NDP2  has  identified  as  unsolvable. 

in  all  nondeterministic  planning  domains.  The 

CP  to  look  for  plans  that  do  not  include  nodes 

1 

Procedure  NDP2(D,  So,  G,  CP) 

2 

71  < —  0 

3 

D  a  determinization  of  D  \  U  <-0 

4 

loop 

5 

S  «—  {all  non-goal  jt -results  of  So) 

6 

if  S  =  0  then  return  n 

7 

arbitrarily  select  a  state  s  eS 

//  Find-Acceptable-Plan  searches  for  an  acyclic  plan  in  D  that 

avoids  the  states  in  U 

8 

call  Find-Acceptable-Plan(D,  D,  s,  G,  CP,  U ) 

9 

if  Find-Acceptable-Plan  returns  a  solution  plan  (aT , . . . ,  d^)  then 

10 

Let  (a-i , . . . ,  ak)  be  the  nondeterministic  actions  corresponding  to  (di , ...,%) 

11 

for  i  =  1 , . . . ,  k  do 

12 

if  7T  (s)  is  defined  then  remove  (s,  7r(s))  from  jt 

13 

insert  (s,  a,)  into  tt 

14 

s^yD(s,ai) 

15 

if  a  goal  state  is  a  jt  -descendant  of  s  then  break 

16 

else 

//  Find-Acceptable-Plan  returned  Failure 

17 

if  s  e  S0  then  return  Failure 

18 

U  «-  U  U  {s} 

19 

foreach  s'  such  that  s  e  y(s',  j z(s'))  do 

20 

Tt  <-7T  \{(S',7T(S'))) 

Then  Find-Acceptable-Plan(D,  D,  so,  G,  CP,  U)  makes  at  most  |S| • | A| +  1  calls  to  CP,  and  returns  an  acyclic  plan,  if  such  a  plan 
exists,  whose  policy  image  in  D  avoids  the  states  in  U. 

For  the  proof,  see  Appendix  A. 

4.3.  NDP2  planning  algorithm 

Algorithm  4  is  the  NDP2  algorithm,  a  modified  version  of  NDPR  that  can  deal  with  unsolvable  states.  The  key  differences 
between  NDP2  and  NDPR  are: 

•  When  NDP2  encounters  an  unsolvable  state,  it  removes  all  actions  that  lead  to  it  from  the  policy  and  adds  the  state  to 
a  set  of  known-unsolvable  states  U  (Line  18). 

•  NDP2  does  not  call  the  classical  planner  directly,  but  instead  calls  Find-Acceptable-Plan,  which  generates  solutions  that 
avoid  the  states  in  U. 


There  are  also  two  key  differences  between  NDP2  and  NDP: 
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Fig.  5.  A  state  in  the  Robot  Navigation  domain. 


•  NDP  removed  state-action  pairs  directly  from  the  domain  instead  of  using  Find-Acceptable-Plan.  There  are  potentially 
doubly-exponentially  many  states  in  the  size  of  the  domain  [12],  meaning  possibly  doubly-exponential  increase  in  the 
size  of  the  determinization  of  the  nondeterministic  planning  domain.  Even  in  the  case  that  a  single  state  is  removed 
from  the  domain,  identifying  a  state  out  of  a  set  S  must  take  on  average  log  |  S  |  space,  increasing  the  size  of  the 
determinized  domain  by  an  exponential  amount  in  the  size  of  the  domain. 

•  As  discussed  in  Appendix  C,  NDP  used  a  plan  integration  routine  which  is  unsound  on  problems  with  unsolvable  states. 
NDP2  does  not  have  this  problem. 

Theorem  2.  Let  CP  be  a  sound  and  complete  classical  planner  and  P  —  (D,  So.  G)  be  a  nondeterministic  planning  problem  with 
D  —  (£,  0).  Let  S  be  the  set  of  all  states  in  C,  (i.e.,  S  =  2f  ),  F  =  fall  ground  atoms  over  C],  and  A  be  the  set  of  all  possible  actions  (i.e., 
all  possible  ground  instantiations  of  the  planning  operators  in  0  ). 

Then  NDP2(D,  So,  G,  CP)  is  sound  and  complete,  and  returns  at  most  in  |S|2  calls  to  Find-Acceptable-Plan. 

For  the  proof,  see  Appendix  A. 

5.  Abstractions  and  compound  abstractions 

In  nondeterministic  planning  domains,  some  major  representation  and  reasoning  problems  can  occur  if  each  action 
has  a  very  large  number  of  possible  outcomes.  Probably  the  best-known  example  of  this  is  the  Robot  Navigation  domain 
[25,37,9,28,29  ,  which  is  illustrated  in  Fig.  5.  In  this  domain,  there  is  a  building  with  several  rooms,  and  a  robot  that  needs 
to  go  among  these  rooms  to  deliver  packages.  To  go  into  or  out  of  a  room,  the  robot  may  need  to  open  a  door,  and  there 
is  a  child  (the  “kid”)  who  can  interfere  with  this  by  running  around  very  quickly,  nondeterministically  opening  and  closing 
some  of  the  doors.  This  problem  can  be  translated  into  a  single-agent  nondeterministic  planning  problem  by  representing 
the  kid’s  actions  as  nondeterministic  outcomes  of  the  robot’s  actions  [25,9]. 

In  a  Robot  Navigation  domain  with  k  “kid  doors”  (i.e.,  doors  that  the  kid  can  open  and  close),  each  of  the  robot’s  actions 
can  have  2k  possible  outcomes:  one  for  each  possible  combination  of  open  and  closed  kid  doors.  If  a  planner  has  to  plan 
for  each  of  these  outcomes  separately,  then  this  causes  an  exponential  blowup  in  the  amount  of  space  needed  to  represent 
the  plan,  and  the  amount  of  time  needed  to  compute  it. 

Planners  such  as  MBP  [9],  POND  [7],  and  Yoyo  [28]  tackle  this  problem  by  using  BDDs  [8]  to  represent  and  reason  about 
sets  of  states  rather  than  individual  states.  For  example,  consider  the  problem  of  finding  a  policy  it  that  will  move  the  robot 
through  door  dl  in  Fig.  5,  regardless  of  which  kid  doors  are  open  and  which  ones  are  closed.  This  policy  will  need  to  contain 
2k  state-action  pairs:  one  for  each  possible  combination  of  open  and  closed  kid  doors.  But  in  each  of  the  2k  states,  the  only 
thing  that  matters  is  whether  dl  is  open  or  closed.  A  planner  that  uses  a  BDD-based  state  representation  can  generate  a 
much  smaller  hyperpolicy  (see  Section  2)  such  as 

7T*  =  {(Si,ai),  (S2,a2)},  (1) 

where  S]  =  [all  states  in  which  the  robot  is  in  rl  and  the  door  is  open},  S2  =  [all  states  in  which  the  robot  is  in  rl  and  the 
door  is  closed},  cq  is  the  action  of  moving  the  robot  from  rl  to  the  corridor,  and  02  is  the  action  of  opening  the  door. 

It  is  not  feasible  for  NDP2  to  use  a  similar  BDD  representation.  That  would  require  extensive  modifications  to  NDP2’s 
classical  planner  CP,  which  conflicts  with  the  objective  of  allowing  CP  to  be  any  classical  planner.  However,  we  sometimes 
can  get  some  of  the  same  benefits,  without  having  to  modify  CP  at  all,  by  preprocessing  the  planning  domain  D  to  produce 
an  abstracted  planning  domain  D*  in  which  some  of  the  states  represent  sets  of  states  in  D.  Once  this  has  been  done,  NDP2 
can  be  called  on  D*  rather  than  D. 
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For  example,  if  D*  contains  two  “abstract  states"  that  represent  the  sets  Si  and  S 2  above,  then  in  D*,  NDP2  can  plan 
how  to  go  through  dl  with  only  two  calls  to  CP.  In  this  case,  the  solution  to  the  planning  problem  is  the  same  hyperpolicy 
7 t*  as  in  Eq.  (1),  but  with  Si  and  S 2  represented  by  abstract  states  rather  than  BDDs. 

The  conjunctive  abstraction  techniques  in  [30]  were  an  initial  version  of  that  approach.  However,  that  work  did  not 
include  a  formal  definition  of  conjunctive  abstraction,  and  all  of  the  modifications  to  the  states  and  planning  operators 
were  done  manually.  This  left  it  unclear  how  or  whether  the  approach  could  be  generalized,  and  whether  it  could  be  done 
automatically.  In  the  following  subsections,  we  develop  an  approach  similar  to  conjunctive  abstraction;  but  we  define  it 
formally  and  provide  pseudocode  for  the  translations. 

5.1.  Language  and  states 

Let  D  —  (£,  0)  be  a  nondeterministic  planning  domain,  and  let  £*  be  an  augmented  version  of  £  such  that  for  every 

predicate  symbol  p  of  £,  £*  includes  both  p  and  a  new  predicate  symbol  p*  of  the  same  arity  as  p.  If  a  =  p(C] . cn)  is 

any  ground  atom  of  £,  then  a*  =  p*(ci, ...  ,cn)  is  the  atom  produced  from  a  by  replacing  p  with  p*.  We  will  call  a*  an 
abstraction  of  a  and  -> a ,  because  its  purpose  is  for  use  in  representing  states  in  which  a  may  be  either  true  or  false. 

If  s  is  a  state  and  a  ^  s,  then  according  to  the  usual  classical  planning  semantics,  a  is  false  in  s  and  a  is  true  in  the  state 
s  U  {«}.  If  we  let  s'  =  s  U  {a*},  then  s'  is  an  abstract  state  that  is  intended  to  represent  both  of  the  states  s  and  s  U  {a}.  More 
generally: 

Definition  1.  If  is  a  state  and  A  =  {ai, . . . ,  a/,}  is  a  set  of  ground  atoms  that  are  not  in  s,  then  s'  U  { a *, ....  or]}}  is  an  abstract 
state,  and  the  set  of  states  that  are  represented  by  s'  is  {s  U  A'  \  A'  c  /l}.  We  let  [s']  denote  this  set  of  states. 

There  is  an  important  difference  between  abstract  states  and  the  belief  states  used  in  partially  observable  planning 
problems.  If  an  abstract  state  s'  contains  the  atom  a* ,  it  does  not  mean  that  a’s  truth  value  will  be  unknown  at  execution 
time.  Instead,  s'  represents  a  set  of  fully  observable  states  in  which  a  may  be  either  true  or  false,  so  that  we  can  plan  for 
these  states  simultaneously. 


Example.  In  the  Robot  Navigation  domain,  consider  all  states  in  which  the  robot  and  the  packages  are  at  the  locations 
shown  in  Fig.  5,  and  all  doors  are  closed  except  that  d6  and  d7  may  each  be  either  open  or  closed.  There  are  four  such 
states: 


si  =  jin(rl),  loc(a,  rl ),  loc(b,  r4),  loc(c,  rl),  open(d6),  open(d7)}; 

(2) 

s2  =  { in(r1 ),  loc(a,  rl ),  loc(b,  r4),  loc(c,  rl),  open(d6)}; 

(3) 

s 3  =  jin(rl),  loc(a,  rl),  loc(b.  r4),  loc(c,  rl),  open(d7)}; 

(4) 

s4  =  jin(rl),  loc(a,  rl),  loc(b.  r4),  loc(c,  rl)}. 

Let 

(5) 

s*  =  jin(rl),  loc(a,  rl),  loc(b,  r4),  loc(c,  rl),  open*(d6),  open*(d7)}. 

Then  the  set  of  all  states  represented  by  s*  is  {s1,s2,S3,S4}.  □ 

(6) 

5.2.  Operators  with  abstract  effects 

We  now  will  describe  a  way  to  rewrite  planning  operators  to  produce  abstract  states. 

Let  0  be  any  operator  in  D,  and  let  E  =  effects(o).  Suppose  that  two  of  the  conjunctions  in  E  are  ei  =  e  a  a  and 
e2  =  e  a  -■Q’,  where  e  is  a  (possibly  empty)  conjunction  of  literals  and  a  is  a  literal  not  in  e.  In  other  words,  one  possible 
effect  of  0  is  to  make  both  e  and  a  true,  and  another  possible  effect  of  0  is  to  make  e  true  and  a  false.  Then  we  can  define 
the  abstraction  of  E  over  {e-i,  e2}  to  be  the  set  of  conjunctions 

£'  =  (£  —  {e  1 ,  e2})  U  {e  A  o'*  A  -■a}. 

The  reason  for  including  in  this  equation  is  because  we  will  want  to  use  E'  for  the  effects  of  an  abstract  operator,  and 
we  need  to  ensure  that  such  an  operator  will  work  correctly  when  executed  in  a  state  s  that  contains  a.  Recall  that  the 
intended  meaning  of  a*  is  to  assert  that  we  are  in  an  abstract  state  in  which  a  may  be  either  true  or  false,  hence  it  would 
be  inconsistent  for  the  abstract  state  to  also  contain  an  assertion  that  a  is  true. 

We  can  perform  the  abstraction  process  iteratively,  abstracting  E'  over  a  pair  of  conjunctions  to  get  E",  abstracting  E" 
over  another  pair  of  conjunctions  to  get  and  so  forth  until  we  get  an  abstraction  E*  of  E  that  is  maximal  (i.e.,  E*  cannot 
be  abstracted  any  further). 

In  general,  there  may  be  more  than  one  maximal  abstraction  of  E.  Algorithm  5  is  a  simple  greedy  algorithm  to  compute 
one  of  them  (we  do  not  care  which  one).  After  the  following  example,  we  will  define  an  abstract  operator  whose  effects 
are  E*. 
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Algorithm  5:  Compute  a  maximal  abstraction  of  a  set  of  conjunctions. 

1  procedure  Create-Abstract-Conjunction(E) 

2  while  there  is  an  abstractable  pair  of  conjunctions  e\ ,  e2  e  E 

3  E  <-  the  abstraction  of  £  over  {ei ,  e2> 

4  return  £ 


Example.  Consider  a  Robot  Navigation  problem  in  which  there  are  two  kid  doors,  d6  and  d7.  Here  is  a  nondeterministic 
action  a  to  open  dl  when  the  robot  is  in  room  rl : 


(7) 

(8) 


pre(a)  =  in  (rl )  A  -■open(dl), 
effects(a)  =  {ej,  e2,  e^,  64}, 


where 

ej  =  open(dl)  A  -■open(d6)  A  -^open(d7); 
e2  =  open(dl)  A  -■open(d6)  A  open(d7); 
e-i  =  open(dl)  A  open(  d6)  a  -■open(d7); 
e4  =  open(dl)  a  open(d6)  A  open(d7). 

If  we  let  E  =  effects(a),  then  E  can  be  abstracted  three  times.  The  first  abstraction  is  to  replace  e\  and  e2  with 

e$  =  open(dl)  a  -■open(d6)  a  open*(d7)  a  -■open(d7), 
the  second  one  is  to  replace  ez  and  e4  with 

ee  =  open(dl),  open(d6)  a  open*(d7)  a  -■open(d7), 
and  the  third  one  is  to  replace  es  and  eg  with 

ei  =  open(dl)  a  open*(d6)  a  -■open(d6)  a  open*(d7)  a  -'0pen(d7). 

This  produces  the  maximal  abstraction 

E*  =  jopen(dl)  A  open*(d6)  A  -■open(d6)  a  open*(  d7)  A  -^open(d7)}.  (9) 

Definition  2.  If  is  a  planning  operator  (or  an  action),  then  an  abstraction  of  0  is  an  operator  (or  action)  0*  such  that 

1.  pre(o*)  is  the  result  of  modifying  pre(o)  by  replacing  each  negative  literal  -> a  with  the  conjunction  --a  a  ->a*; 

2.  effects(o*)  is  a  maximal  abstraction  of  effects(o). 

The  reason  for  replacing  negative  literals  with  conjunctions  in  pre(o*)  is  to  prevent  0*  from  being  applied  in  cases  where 
applying  it  would  be  unsound.  It  is  not  necessary  to  replace  positive  literals  with  conjunctions,  because  no  state  will  ever 
contain  both  a  and  a*. 

Example.  Let  si,S2,S3.S4,s*  be  as  in  Eqs.  (2)-(6),  and  a  be  as  in  Eqs.  (7)— (8).  Then  the  following  action  a*  is  an  abstraction 
of  a: 

pre(a*)  =  in  (rl )  A  -^open(dl)  A  -'0pen*(d1) 

effects(a*)  =  jopen(dl)  a  open*(d6)  a  -■open(d6)  a  open*(d7)  a  ->open(d7)}.  (10) 

Thus  y(si,a*)  =  {s*J. 

In  a*’s  precondition,  the  literal  -’0pen*(d1)  prevents  a*  from  being  applied  to  abstract  states  where  applying  it  would  be 
unsound,  such  as  the  following  state: 

s**  =  jin(rl),  loc(a,  rl),  loc(b,  r4),  loc(c,  rl),  open*(d1),  open*(d6)}. 

Before  a*  can  be  applied,  [s**]  must  first  be  split  into  two  subsets:  the  states  that  satisfy  openfdl )  and  the  states  that  don’t. 
a*  will  be  applicable  to  the  second  subset  but  not  the  first  one.  □ 
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To  provide  a  means  for  splitting  abstract  states  into  subsets,  we  will  define,  for  each  predicate  symbol  p  of  D,  a  splitting 
operator  split -p  such  that 


pre(split-p)  =  {p*(xi, . . .  ,x„)}; 

effects(split-p)  =  {— -p*(xi, . . . ,  x„)  a  p(xi, . . . ,  x„),  -p*(xi . xn)}; 

where  n  is  the  arity  of  p. 

Example.  Continuing  the  previous  example,  the  operator  split-open(x)  has 
pre(split-open)  =  {open*(x)j; 

effects(split-open)  =  {-■open*(x)  a  open(x),  -■open*(x)}.  (11) 

Thus,  split-open(d1 )  will  split  s**  into  a  pair  of  abstract  states:  one  in  which  dl  is  open,  and  one  in  which  it  is  closed: 

s*  =  jin(rl),  loc(a,  rl),  loc(b.  r4),  loc(c,  rl),  open(dl),  open*(d6)}; 

=  { inf  rl ),  loc(a,  rl),  locfb.  r4),  loc(c,  rl),  open*(d6)}.  □ 

Note  that  although  splitting  operators  resemble  nondeterministic  planning  operators  syntactically,  their  semantics  are 
quite  different:  they  do  not  correspond  to  actions  in  D,  and  their  possible  outcomes  do  not  model  nondeterminism  in  D. 
Instead,  they  simply  perform  bookkeeping  operations  for  the  purpose  of  translating  sets  of  states  (represented  as  abstract 
states)  back  into  ordinary  states,  and  they  do  not  appear  in  the  solution  policies  returned  by  NDP2. 

Definition  3.  An  abstraction  of  a  nondeterministic  planning  domain  D  is  a  planning  domain  D*  in  which  the  set  of  operators 
is  0*  U  Z,  where 

•  0*  contains  an  abstraction  of  each  planning  operator  0  in  D; 

•  Z  contains  a  splitting  operator  split-p  for  every  predicate  p  in  £  such  that  p*  appears  in  the  effects  of  at  least  one 
operator  0  e  0*. 

By  extension,  if  P  =  (D,  So,  G)  is  a  planning  problem  in  D,  then  we  will  call  P*  =  (D*.  So,  G)  an  abstraction  of  P. 

Since  a  solution  jt*  to  an  abstracted  problem  represents  a  hyperpolicy,  it  is  possible  to  extract  an  ordinary  policy  it 
from  it.  Algorithm  7  in  Appendix  B  is  an  algorithm  for  doing  this.  The  basic  idea  is  quite  similar  to  a  policy-extraction 
algorithm  that  is  provided  with  the  MBP  planner— and  just  as  with  MBP’s  policy-extraction  algorithm,  which  is  almost  never 
used,  there  is  no  real  need  for  Algorithm  7.  Given  any  state  s,  finding  the  action  to  perform  in  s  is  no  harder  to  do  with 
re*  than  with  71,  and  in  domains  such  as  the  Robot  Navigation  domain,  tt*  is  much  easier  to  use  since  jt  is  exponentially 
larger. 

5.3.  Compound  abstractions 

In  order  to  create  abstract  planning  problems,  we  modified  the  planning  operators’  effects  to  produce  abstractions  of 
pairs  of  literals.  But  the  preconditions  of  each  planning  operator  still  referred  to  the  original  literals  rather  than  the  abstract 
ones,  making  it  necessary  to  use  splitting  operators  to  map  some  of  the  abstract  literals  back  to  the  original  literals  before 
applying  the  planning  operator.  When  certain  conditions  are  satisfied,  it  is  possible  to  modify  some  of  the  planning  oper¬ 
ators’  preconditions  to  refer  directly  to  the  abstract  literals,  removing  the  need  for  the  splitting  operators.  We  provide  an 
algorithm  to  do  this. 

Let  P*  =  (D*,So,G)  be  an  abstraction  of  a  planning  problem  P  =  (D.So.G).  Let  Z  and  0*  be  the  sets  of  splitting 
operators  and  planning  operators  in  P*.  A  splitting  operator  split-p  e  Z  is  compoundable  with  a  planning  operator  0  e  0*  if 
the  following  conditions  hold: 

•  p  occurs  exactly  once  in  pre(o),  in  a  non-negated  atom  a; 

•  Each  conjunction  e  e  effects(o)  contains  at  most  one  of  a,  -•a,  and  a*,  and  no  other  atom  in  effects(o)  is  unifiable  with 
a  or  a* ; 

•  p  does  not  appear  in  G. 

For  each  0  e  0*.  we  let  Z0  be  the  set  of  all  splitting  operators  that  are  compoundable  with  0.  For  each  splitting  operator 
split-p  e  Z0,  we  let  the  compound  operator  split-p  ■  0  be  an  operator  whose  precondition  is  pre(o)  with  a  replaced  by  a*, 
and  whose  effects  are  effects(o)  with  the  following  modifications: 
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Algorithm  6:  Compute  a  compound  abstraction  of  a  planning  problem.  Z  is  the  set  of  splitting  operators  in  D* *,  and 
0*  is  the  set  of  non-splitting  planning  operators  in  D*. 

1  Procedure  Create-Compound-Operators(0*.  £) 

2  0**^0* 

3  foreach  planning  operator  o  e  0*  do 

4  B  «—  {all  abstract  predicates  in  pre(o)) 

5  foreach  set  of  abstract  predicates  B'  c  2®  do 

6  o**  <-  split-pi  ■  split-p2  • . . .  •  split-p|B'|  •  o,  where  |B'|  is  the  size  of  B'  and  pi,  P2, . . . ,  P|s'|  e  B' 

7  insert  o**  into  0** 

8  return  0  ** 


•  for  each  effect  e  e  effects(o)  that  does  not  contain  a*  or  ->o\  replace  e  with  e  a  ->a*  A  a; 

•  for  each  effect  e  e  effects(o)  that  contains  ->a,  replace  ->0!  with  —-or* ; 

•  add  an  additional  effect  ->a  A -■a*,  to  represent  the  case  where  split-p’s  nondeterministic  outcome  is  -ior  A->or*  (whence 
o  is  inapplicable). 

Example.  Let  a*  be  as  in  (10),  and  split-open  be  as  in  (11).  Then  split-open  (d  1 )  and  a*  are  not  compoundable,  because 
pre(a*)  contains  ->open(d1)  rather  than  opentdl ).  But  consider  the  following  action  b* ,  which  is  an  abstraction  of  an  action 
for  exiting  room  rl  through  door  dl : 

pre(b*)  =  in(r1 )  A  openfdl ); 

effects(b*)  =  {— >in(r1)  a  in(hall)  a  open*(d6)  a  open*(d7)}. 

In  most  Robot  Navigation  problems,  the  goal  G  consists  entirely  of  package  locations,  so  that  the  open  predicate  does  not 
occur  in  G,  whence  split-open(dl)  is  compoundable  with  b*.  The  compound  operator  split-open(dl)  ■  b*  has 

pre(split-open(d1  )-b*)  =  in(rl)  A  open*(d1); 

effects(split-open(d1)  •  b*)  =  { — -infrl )  a  in(hall)  a  open*(d6)  a  open*(d7)  a  -■open*(d1)  a  open(dl), 

in(rl)  A -■open(dl)  A -'Open*(d1)}.  □ 

If  two  splitting  operators  split-p  and  split-q  are  both  compoundable  with  o,  then  it  is  not  hard  to  show  that  split-p  is 
compoundable  with  split-q  •  o. 

If  o  e  0*,  and  if  Z’  =  {split-pi, . . . ,  split-p;<}  is  an  ordered  set  of  splitting  operators  that  are  compoundable  with  o,  then 
we  will  define 

Z’  0  =  split-pi  •  Split-P2  • . .  •  •  split-p;<  •  0. 

We  will  define 

0**  =  [S’  ■  o  |  o  e  0*  and  Z'  c  Z^}, 

where  Z0  =  {all  splitting  operators  in  Z  that  are  compoundable  with  o},  and  where  we  assume  an  arbitrary  sequential 
order  on  the  operators  in  each  subset  Z'  of  Z0.  Thus  0**  includes  all  of  the  compound  operators  and  all  of  the  operators 
in  0*.  Let  Zvc  be  the  set  of  all  splitting  operators  that  are  non-compoundable,  i.e.,  each  split-p  e  Zjvc  either  p  appears  in 
the  goal  G  or  p  is  in  the  precondition  of  some  operator  o  but  is  not  compoundable  with  o.  Then  the  planning  problem 
P**  =  (£,  0**  U  ZNC,  G)  is  a  compound-abstract  version  of  P*. 

Algorithm  6  is  a  high-level  description  of  our  procedure  to  automatically  create  compound  abstractions  of  planning 
operators  given  a  set  of  abstract  predicates  and  the  splitting  operators  for  those  predicates  in  the  planning  domain.  For  each 
planning  operator  o  in  the  planning  domain,  Algorithm  6  first  generates  all  of  the  abstract  predicates  that  appear  in  the 
preconditions  of  o  in  the  set  B  (Line  4).  For  each  subset  B'  of  B,  the  algorithm  then  creates  a  compound  operator  o**  from 
the  predicates  in  that  subset  and  the  planning  operator  o.  The  rationale  behind  considering  every  subset  is  for  the  sake  of 
completeness:  the  compound  abstraction  must  create  a  planning  operator  for  each  possible  case  where  some  of  the  literals 
in  o’s  precondition  are  abstracted  and  the  rest  are  not. 

For  each  abstract  predicate  p  in  B\  Algorithm  6  first  finds  the  splitting  operator  for  p  and  then  creates  a  compound 
abstraction  of  o  with  that  splitting  operator  (Line  6).  The  compound  operator  o**  is  then  inserted  into  the  set  of  operators 
to  be  returned  by  the  algorithm  (Line  7). 

Algorithm  8  in  Appendix  B  is  an  algorithm  to  translate  a  compound-abstract  solution  it**  for  P**  into  an  abstract 
policy  7T*.  The  basic  idea  is  quite  simple;  for  each  action  in  it**  that  is  compound,  the  algorithm  separates  it  into  its 
two  component  pieces  (a  splitting  action  and  an  ordinary  abstract  action).  By  first  running  Algorithm  8  and  then  running 
Algorithm  7,  one  could  extract  an  ordinary  policy  i r.  However,  as  we  pointed  out  at  the  end  of  Section  5.2,  there  is  no  real 
need  to  do  this  since  the  abstract  policy  7T*  is  easier  to  work  with. 
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6.  Experimental  evaluation 


We  implemented  NDP2  in  Common  Lisp,  and  compared  it  experimentally  with  MBP  [9]  in  six  fully-observable  nonde- 
terministic  planning  domains  that  were  chosen  to  present  a  variety  of  different  issues  for  the  planners  to  deal  with.  As 
NDP2’s  classical  planner  in  these  experiments,  we  used  FF  [21],  since  it  was  the  classical  planner  that  had  worked  best  with 
NDP  [30], 

For  each  planning  domain,  we  tested  the  planners  on  a  large  suite  of  randomly-generated  test  problems  of  multiple  sizes, 
for  a  total  of  about  4500  planning  problems.  We  ran  both  NDP2  and  MBP  on  Intel  Xeon  2.33  GHz  processors  running  Red 
Hat  Enterprise  Linux  5.5.  We  gave  both  planners  2  hours  and  2  GB  of  RAM  to  complete  each  planning  problem. 

We  attempted  to  broaden  our  comparison  to  include  POND  [7]  and  GAMER  [11],  but  were  unable  to  do  so.  POND  does 
not  support  planning  problems  that  require  cyclic  solutions.  In  GAMER  we  ran  into  several  implementation  issues  that 
prevented  it  from  creating  proper  ground  versions  of  our  problems.  Thus,  despite  very  helpful  discussions  with  the  authors 
of  these  planners,  we  were  not  able  to  run  them  on  the  problems  in  our  test  suite. 

Three  of  the  planning  domains  are  everywhere-solvable,  and  all  three  of  them  are  well-known  from  previous  experimen¬ 
tal  studies: 

•  In  the  Robot  Navigation  domain  with  7  kid  doors  (see  Section  5),  each  action  has  27  possible  outcomes.  Thus  in  order 
to  avoid  a  huge  combinatorial  explosion  in  the  search  space,  it  is  essential  for  the  planner  to  partition  the  states  into 
a  small  number  of  classes  and  plan  over  those  classes,  rather  than  reason  about  each  state  individually.  MBP’s  BDD 
representation  enables  it  do  such  reasoning  quite  well  in  this  domain  [37,27],  and  we  wanted  to  see  whether  our 
abstraction  techniques  would  work  well  enough  to  make  NDP2  competitive  with  MBP. 

•  In  the  Hunter-Prey  domain  [2,10],  each  action  has  roughly  5"  outcomes,  where  n  is  the  number  of  prey.  Thus,  although 
the  number  of  locations  are  polynomial,  the  amount  of  nondeterminism  for  the  hunter  after  each  of  its  move  increases 
combinatorially  with  the  number  of  prey  in  the  domain.  Our  abstraction  techniques  do  not  work  in  this  domain,  and 
we  wanted  to  see  how  this  would  affect  NDP2’s  performance. 

•  In  the  Nondeterministic  Blocks  World  [26],  reasoning  over  sets  of  states  is  not  particularly  useful,  but  there  is  a  large 
number  of  goal  interactions  (e.g.,  deleted-condition  interactions)  to  deal  with.  Many  classical  planners  are  good  at 
reasoning  about  such  interactions,  and  we  wanted  to  see  if  NDP2  could  take  advantage  of  this. 

In  everywhere-solvable  planning  domains,  NDP2  calls  its  classical  planner  at  most  once  per  reachable  state,  because  the 
classical  planner  (assuming  it  is  complete)  will  never  return  failure.  But  in  planning  domains  that  contain  unsolvable  states, 
NDP2  may  need  to  call  the  classical  planner  many  times  per  state.  To  see  how  this  would  affect  NDP2’s  performance,  we 
compared  NDP2  with  MBP  on  three  planning  domains  that  contained  many  unsolvable  states: 

•  The  Exploding  Blocks  World  has  been  used  in  several  of  the  International  Planning  Competitions,  e.g.,  [44,6],  For  most 
planning  problems  in  this  domain,  the  solution  must  include  actions  that  would  be  redundant  in  any  solution  to  the 
determinized  version  of  the  problem;  and  since  the  classical  planner  is  unlikely  to  generate  plans  that  include  those 
redundant  actions,  the  classical  planner  will  usually  return  plans  that  lead  to  unsolvable  states  in  the  original  problem  P. 
But  this  difficulty  is  mitigated  by  the  small  branching  factor  of  the  nondeterminism:  unlike  the  competition  version  of 
this  domain,  we  only  had  one  explosive  block. 

•  The  Tire  World  has  also  been  used  in  several  of  the  International  Planning  Competitions,  e.g.,  [44,6].  Like  the  Exploding 
Blocks  World,  it  requires  solutions  whose  actions  are  redundant  in  the  determinization.  On  one  hand,  Tire  World  has 
fewer  available  actions  per  state  than  the  Exploding  Blocks  World;  but  on  the  other  hand,  the  size  of  the  smallest 
solution  to  Tire  World  problems  grows  exponentially  with  the  number  of  locations  in  the  domain. 

•  Lost  in  Space  is  a  new  domain  that  we  developed  to  test  NDP2’s  subroutines  for  avoiding  unsolvable  states.  For  planning 
problems  in  this  domain,  the  shortest  solution  for  the  determinized  planning  problem  almost  always  leads  to  unsolvable 
states  in  nondeterministic  planning  problem.  Thus  NDP2  must  repeatedly  modify  its  determinization  of  the  planning 
domain,  in  order  to  force  the  classical  planner  to  avoid  using  any  of  the  problematic  actions. 

In  the  Robot  Navigation  domain,  we  tested  the  planners  on  the  problems  in  our  test  suite,  and  also  on  abstract  and 
compound-abstract  versions  of  the  same  problems.  We  used  the  translation  algorithms  (Algorithms  5  and  6)  to  generate 
these  versions  of  the  problems.  We  did  not  bother  to  develop  computer  implementations  of  those  algorithms,  but  instead 
ran  them  by  hand. 

For  the  other  planning  domains  in  our  experiments,  we  did  not  run  separate  experiments  on  abstract  and  compound- 
abstract  versions  of  the  problems,  because  those  versions  of  the  problems  are  identical  to  the  original  versions.  The 
abstraction  and  compound-abstraction  techniques  modify  a  planning  operator  only  when  some  of  the  operator’s  nonde¬ 
terministic  outcomes  differ  by  a  single  literal— and  in  those  domains,  every  pair  of  nondeterministic  outcomes  differ  by 
more  than  one  literal. 
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Fig.  6.  Average  CPU  times  on  Robot  Navigation  problems  with  7  kid  doors,  as  a  function  of  the  number  of  packages. 


6.1.  Experiments  with  everywhere-solvable  domains 

Robot  Navigation  [9]  The  first  set  of  experiments  were  in  the  Robot  Navigation  domain  described  previously,  with  k—1 
(i.e.,  all  7  doors  were  kid  doors).  We  varied  the  number  of  packages  n  from  1  to  10.  For  each  value  of  n,  we  measured  each 
planner’s  average  CPU  time  on  100  randomly-generated  problems.  As  in  [38],  MBP’s  CPU  times  include  both  its  preprocessing 
and  search  times.  Omitting  the  former  would  not  have  significantly  affected  the  results,  because  the  preprocessing  times 
were  never  more  than  a  few  seconds,  and  usually  below  one  second. 

In  addition  to  testing  the  algorithms  on  Robot  Navigation  problems,  we  also  tested  them  on  abstract  and  compound- 
abstract  versions  of  the  problems.  We  used  the  translation  algorithms  (Algorithms  5  and  6)  to  generate  these  versions, 
performing  these  algorithms  manually  rather  than  running  them  on  the  computer.  Those  algorithms  are  easy  to  perform  by 
hand;  and  furthermore,  the  Robot  Navigation  domain  was  the  only  one  of  our  experimental  domains  in  which  we  needed 
to  use  them.  In  all  of  the  other  planning  domains  in  our  experiments,  the  abstract  and  compound-abstract  versions  are 
identical  to  the  original  domain. 

Fig.  6  shows  the  average  CPU  times  for  all  cases  where  a  planner  solved  all  100  problems  (a  meaningful  average  is 
impossible  if  the  planner  solves  some  of  the  problems  but  not  all  of  them).  The  labels  MBP  and  NDP2  are  for  the  original 
planning  problems,  MBP-A  and  NDP2-A  are  for  the  abstract  versions  of  the  problems,  and  MBP-CA  and  NDP2-CA  are  for  the 
compound-abstract  versions  of  the  problems.  We  discuss  the  results  below. 

MBP  did  worse  on  the  abstract  versions  of  the  problems  than  on  the  original  problems,  because  the  splitting  operators 
increased  the  branching  factor  of  MBP’s  search  space  by  creating  branches  in  MBP’s  BDD  structure  in  places  where  MBP 
would  not  ordinarily  have  created  branches.  MBP  did  better  on  the  compound-abstract  problems  than  the  abstract  ones, 
because  the  compound  operators  alleviated  the  search-space  blowup  caused  by  the  splitting  operators. 

Surprisingly,  MBP  did  better  on  the  compound-abstract  versions  of  problems  with  5  or  more  packages  than  on  the 
original  versions  of  those  problems.  This  puzzles  us,  but  we  suspect  the  compound-abstraction  helped  MBP  to  focus  its 
search  on  parts  of  the  search  space  that  were  relevant  for  finding  a  solution. 

On  the  original  planning  problems,  where  NDP2  had  to  reason  about  each  of  the  27  outcomes  of  each  action,  its  per¬ 
formance  was  quite  poor.  It  solved  all  of  the  1 -package  problems,  and  some  of  the  2-  and  3-package  problems,  but  no 
problems  larger  than  that.  As  we  had  hoped,  NDP2  did  better  on  the  abstracted  versions  of  the  problems:  it  solved  all  of 
the  problems  with  3  or  fewer  packages,  and  some  problems  with  4  to  7  packages.  But  this  was  still  much  worse  than  MBP’s 
performance,  and  we  believe  it  is  because  FF’s  hill-climbing  algorithm  returned  plans  with  extraneous  split  actions  that 
produced  needless  branches  in  the  policy. 

In  the  compound-abstract  planning  problems,  NDP2  did  dramatically  better:  it  completed  problems  with  up  to  10  pack¬ 
ages,  and  it  outperformed  MBP  on  problems  with  7  or  more  packages.  In  the  original  problems,  NDP2  had  to  call  FF  roughly 
27  times  for  every  step  of  the  initial  weak  solution— but  in  the  compound-abstract  problems,  NDP2’s  number  of  calls  to  FF 
was  less  than  twice  the  number  of  steps  in  the  initial  weak  solution. 

Hunter-Prey  [2,10]  In  this  domain,  the  world  is  an  n  x  n  grid  in  which  a  hunter  wants  to  catch  one  or  more  prey.  The 
hunter  has  five  possible  actions;  move  north,  south,  east,  or  west,  and  catch  (the  latter  is  applicable  only  when  the  hunter 
and  prey  are  in  the  same  location).  Each  prey  has  also  five  actions:  the  four  movement  actions  plus  a  stay-still  action.  Like 
the  kid  in  the  Robot  Navigation  domain,  the  prey  are  not  represented  as  separate  agents:  instead,  their  possible  actions  are 
encoded  as  nondeterministic  outcomes  of  the  hunter’s  actions. 
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Fig.  7.  Average  CPU  times  in  seconds  in  Hunter-Prey  problems  with  one  prey,  as  a  function  of  grid  size. 
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Fig.  8.  Average  CPU  times  in  seconds  in  Hunter-Prey  problems  as  a  function  of  the  number  of  prey,  on  a  5  x  5  grid. 


Fig.  7  shows  running  times  when  there  is  just  one  prey  and  the  grid  size  varies  from  2  x  2  to  8  x  8,  and  Fig.  8  shows 
running  times  when  the  grid  size  is  fixed  at  5  x  5  and  the  number  of  prey  varies  from  1  to  5.  Each  data  point  is  the  average 
of  100  randomly  generated  problems. 

MBP’s  running  times  were  quite  good,  because  MBP’s  BDDs  did  quite  well  at  compressing  the  search  space.2  By  the 
nature  of  the  domain,  any  strong  cyclic  policy  must  cover  most  of  the  problem’s  reachable  states,  yet  MBP  could  use  a 
single  BDD  to  represent  the  set  of  all  states  in  which  the  hunter  needed  to  move  in  a  particular  direction. 

In  contrast,  NDP2  had  to  reason  about  each  of  those  states  separately.  When  there  was  just  one  prey,  the  number  of 
states,  and  thus  NDP2’s  running  time,  grew  polynomially  with  the  number  of  locations.  But  the  number  of  states  grew 
exponentially  with  the  number  of  prey,  so  NDP2  did  not  solve  any  problems  with  more  than  one  prey. 

Nondeterministic  Bloclcs  World  [26]  The  nondeterministic  Blocks  World  is  like  the  classical  Blocks  World,  except  that  an 
action  may  have  three  possible  outcomes:  (1)  the  same  outcome  as  in  the  classical  case,  (2)  the  block  slips  out  of  the 
gripper  and  drops  on  the  table,  and  (3)  the  action  fails  completely  and  the  state  does  not  change.  Neither  of  the  abstraction 
techniques  can  be  used  in  this  domain,  for  the  same  reason  as  in  the  Hunter-Prey  domain. 

Fig.  9  shows  the  planners’  average  CPU  times  in  this  domain,  as  a  function  of  the  number  of  blocks.  Each  data  point 
represents  the  average  running  time  on  100  random  problems.  NDP2  outperformed  MBP  for  three  reasons: 

1.  There  were  no  large  sets  of  states  that  could  be  clustered  together;  hence  MBP’s  BDD-based  representation  could  not 
make  much  difference. 

2.  MBP  did  not  exploit  the  heuristics  used  in  the  classical  planners,  hence  MBP  searched  most  of  the  state  space  in  most 
planning  problems. 

3.  Every  action  has  three  outcomes,  but  they  are  structured  so  that  at  least  one  of  them  (and  often  two)  lead  to  a  state 
already  seen  by  the  planner.  Thus  the  number  of  calls  NDP2  must  make  to  FF  scales  linearly  with  the  number  of  blocks 
in  the  problem. 


2  MBP  did  not  work  as  well  on  Hunter-Prey  problems  in  [27,29],  because  those  papers  used  a  version  of  the  Hunter-Prey  domain  in  which  prey  could 
not  occupy  adjacent  squares— a  restriction  that  interfered  greatly  with  the  effectiveness  of  MBP’s  BDDs.  We  did  not  use  such  a  restriction  in  this  paper. 
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Fig.  9.  Average  CPU  times  in  the  nondeterministic  Blocks  World,  as  a  function  of  the  number  of  blocks. 


Fig.  10.  Average  CPU  times  in  Exploding  Blocks  World,  as  a  function  of  the  number  of  blocks. 


6.2.  Planning  domains  with  unsolvable  states 

Exploding  Blocks  World  [6]  The  nondeterministic  Exploding  Blocks  World  is  much  like  the  classical  Blocks  World  except  that 
there  may  be  one  or  more  exploding  blocks,  which  may  or  may  not  destroy  the  table  or  block  underneath  them  when 
they  are  put  down.  In  order  for  a  problem  to  have  a  solution,  there  must  be  enough  accessible  spare  blocks  to  defuse  the 
exploding  blocks.  In  any  solution,  a  spare  block  must  be  uncovered  and  placed  on  the  table  before  an  exploding  block  is 
moved.  Then  the  exploding  blocks  must  be  repeatedly  placed  on  the  spare  until  it  explodes,  making  it  safe  to  move  the 
exploding  block  elsewhere. 

Fig.  10  shows  the  completion  times  for  each  planner  when  there  is  a  single  exploding  block  and  a  single  spare  block,  with 
a  total  of  n  + 1  blocks  in  the  initial  state,  and  n  blocks  in  the  goal  state.  There  were  100  planning  problems  for  each  number 
of  blocks  between  3  and  8.  As  with  the  previous  blocks  world  variant,  NDP2  outperformed  MBP  for  all  but  the  smallest 
problems.  Most  likely,  MBP  performed  poorly  in  the  exploding  blocks  domain  for  much  the  same  reasons  it  performed 
poorly  with  nondeterministic  blocks  world  problems,  that  is  the  lack  of  a  heuristic  function  and  lack  of  clusterable  states. 

In  the  exploding  blocks  world,  the  only  nondeterministic  actions  are  actions  that  move  unexploded  blocks.  Thus  the 
amount  of  nondeterminism  is  lower  than  in  the  nondeterministic  blocks  world,  so  we  might  expect  NDP2  to  perform  much 
better  than  it  did  on  the  nondeterministic  blocks  world  problems.  However,  moving  an  exploding  block  before  defusing  it 
with  the  spare  block  leads  to  an  unsolvable  state,  and  there  is  no  reason  for  FF  to  avoid  this  sequence  of  events.  Even 
when  an  exploding  block  is  in  hand  and  a  spare  block  is  clear  and  on  the  table,  there  are  as  many  actions  available  in 
the  initial  state  which  lead  to  unsolvable  states  as  there  are  clear  blocks,  and  NDP2  may  need  to  rule  out  each  action  in 
turn.  Somewhat  surprisingly,  the  relative  lack  of  nondeterminism  balances  out  with  the  propensity  to  find  unsolvable  states, 
and  NDP2  performs  similarly  in  both  the  exploding  and  nondeterministic  blocks  world  variants,  despite  vastly  different 
structures  in  their  nondeterminism. 

Tire  World  [6]  Our  Tire  World  variant  consists  of  a  triangular  grid  of  connected  places  with  tires  interspersed  between  them, 
and  the  goal  is  to  move  the  car  from  the  initial  location  to  a  goal  location.  The  car  may  get  a  flat  tire  after  every  move, 
meaning  the  car  must  carry  a  spare  tire,  and  replace  it  once  it  is  consumed.  In  our  experiments,  we  added  tires  at  random 
to  the  initial  state  until  the  problem  was  solvable. 
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Fig.  11.  Completed  problems  (out  of  100)  in  Triangle  Tire  World,  as  a  function  of  the  number  of  locations. 
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Fig.  12.  Average  CPU  time  in  Triangle  Tire  World,  as  a  function  of  the  number  of  locations,  for  the  cases  (see  Fig.  12)  where  the  planner  completed  all  100 
problems  of  that  size. 

A  before,  we  tested  the  planners  on  100  problems  of  each  size.  Fig.  11  shows  how  many  problems  of  each  size  the 
planners  solved,  and  Fig.  12  shows  the  average  CPU  times  on  the  problem  sizes  in  cases  where  the  planners  solved  all  prob¬ 
lems  of  that  size.  MBP  solved  every  problem  with  3  to  21  locations,  and  generally  solved  these  problems  faster  than  NDP2; 
but  did  not  solve  any  problem  with  more  than  21  locations.  NDP2  solved  all  problems  with  3  to  10  locations,  most  of  the 
problems  with  15  to  45  locations,  and  some  problems  with  up  to  66  locations. 

In  each  Tire  World  problem,  the  number  of  states  in  the  smallest  strong  cyclic  solution  is  exponential  in  the  length  of  the 
shortest  solution  to  the  determinized  problem.  Furthermore,  many  times  the  shortest  determinized  solution  leads  through 
an  area  where  there  would  not  be  enough  spare  tires  if  any  flats  occur,  hence  the  nondeterministic  domain  contains  an 
exponential  number  of  unsolvable  states  (not  all  of  which  are  immediately  apparent). 

This  means  NDP2’s  running  time  is  potentially  doubly-exponential  due  to  the  number  of  calls  it  must  make  to  CP: 
exponential  in  the  length  of  the  shortest  determinized  solution,  and  exponential  in  the  difference  in  length  between  the 
shortest  successful  path  to  the  goal  if  no  flat  tires  occur,  and  the  length  of  the  shortest  successful  path  to  the  goal  if  a  flat 
tire  occurs  at  every  move.  Consequently,  for  the  problems  of  sizes  15  and  21,  there  were  few  problems  that  NDP2  did  not 
solve  within  the  time  limit,  even  though  MBP  solved  all  100  problems  of  each  size.  This  is  why  Fig.  12  contains  data  points 
for  MBP  but  not  NDP2  at  those  sizes. 

On  the  other  hand,  many  of  the  problems  have  solutions  that  differ  only  slightly  from  the  shortest  path,  and  NDP2’s 
performance  is  “only”  exponential  in  the  length  of  that  path,  and  so  NDP2’s  indirect  use  of  FF’s  heuristic  function  enabled 
it  to  solve  some  of  the  planning  problems  all  the  way  up  to  size  66,  even  though  MBP  could  not  solve  any  problems  larger 
than  size  21. 

Lost  in  space  As  we  mentioned  earlier,  our  original  purpose  in  developing  the  Lost  in  Space  (LiS)  domain  was  to  test  NDP2’s 
subroutines  for  avoiding  unsolvable  states.  But  the  domain  has  another  property  that  made  it  useful  for  our  experiments:  the 
domain  is  simple  enough  that  we  can  use  it  to  gauge  the  worst  case  performance  of  the  Find-Acceptable-Plan  subroutine. 

A  planning  problem  instance  in  the  LiS  planning  domain  is  a  simple  line  of  locations,  with  the  agent  at  one  end  and 
the  goal  at  the  other.  The  solution  to  an  LiS  planning  problem  is  a  policy  that  moves  the  agent  from  its  initial  location  to 
the  goal.  The  agent  can  move  between  locations  by  using  one  of  two  actions:  walking  between  connected  locations:  and 
teleporting  between  any  two  locations,  which  can  succeed  or  leave  the  agent  lost  and  unable  to  move.  This  means  that  for 
a  problem  with  n  locations,  there  are  n  + 1  states,  n2  +  2n  —  2  actions,  and  a  single  correct  policy.  Since  the  teleport  action 
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Number  of  Locations 

Fig.  13.  Average  CPU  times  of  MBP  and  NDP2  on  problems  (out  of  20)  in  Lost  in  Space,  as  a  function  of  the  number  of  locations. 

in  our  determinization  of  LiS  always  leads  to  the  goal,  FF  will  almost  always  return  plans  that  use  it.  Thus  NDP2  should 
have  to  make  0(n3)  calls  to  the  classical  planner  to  develop  a  policy  for  an  LiS  planning  problem. 

We  ran  both  NDP2  and  MBP  20  times  on  each  of  20  problem  instances  with  5  to  100  locations.  There  is  only  one 
problem  instance  for  each  problem  size  in  the  LiS  planning  domain,  but  we  ran  the  algorithms  20  times  on  each  instance  in 
order  to  reduce  statistical  variations  in  the  running  times— especially  the  running  times  of  NDP2’s  calls  to  FF,  which  makes 
some  random  choices  that  cause  its  running  time  to  vary. 

NDP2  was  able  to  solve  all  problems.  MBP  did  not  solve  problems  with  more  than  80  locations.  In  addition  to  the  results 
above,  we  also  report  the  average  CPU  times  of  the  planners  in  our  experiments.  Fig.  13  shows  the  average  CPU  time  for 
each  planner  per  size  of  problem.  As  expected,  FF  consistently  used  the  determinized  version  of  teleport  for  every  state  until 
ConstrainProblem  removed  that  option.  Both  NDP2  and  MBP  showed  sub-exponential  growth  of  CPU  time  in  the  number 
of  locations,  though  NDP2  has  a  slower  growth  rate,  overcoming  its  initial  disadvantage  for  problems  with  70  or  more 
locations. 

6.3.  Summary  and  discussion  of  the  experimental  results 

Here  is  a  quick  summary  of  the  results  in  each  domain,  along  with  our  understanding  of  the  reasons  for  those  results: 

•  In  the  Robot  Navigation  domain,  the  amount  of  nondeterminism  was  extremely  high.  Here,  NDP2’s  performance  against 
MBP  depended  on  how  good  a  way  we  gave  it  to  deal  with  the  nondeterminism.  Without  abstraction,  it  did  quite  badly. 
With  ordinary  abstraction  it  did  a  little  better,  and  with  compound  abstraction  it  did  much  better. 

•  In  the  Hunter-Prey  domain,  our  abstraction  techniques  weren’t  applicable,  so  we  couldn’t  give  NDP2  a  way  to  deal  with 
the  nondeterminism  in  this  domain.  Consequently,  NDP2  did  badly. 

•  In  the  Nondeterministic  Blocks  World  and  the  Exploding  Blocks  World  domains,  the  amount  of  nondeterminism  was 
relatively  small,  and  FF’s  search  heuristics  worked  well.  Thus  NDP2  did  much  better  than  MBP. 

•  In  the  Triangle  Tire  World  domain,  NDP2’s  performance  on  each  problem  depended  on  whether  the  plans  returned  by 
FF  contained  “bad”  actions  (i.e.,  actions  that  looked  good  in  the  determinized  domain  but  led  to  unsolvable  states  in 
the  nondeterminized  domain).  Consequently,  NDP2’s  performance  is  in  some  ways  better  than  MBP’s  (e.g.,  how  many 
problems  it  could  solve),  and  in  some  ways  worse  than  MBP’s  (e.g.,  the  amount  of  CPU  time  it  used). 

•  What  happened  in  the  Lost  in  Space  domain  was  similar  to  what  happened  in  the  Triangle  Tire  World  domain.  But  in 
this  case,  the  number  of  bad  actions  in  this  domain  is  much  smaller,  so  NDP2  did  much  better  overall. 

7.  Related  work 

7.1.  Using  a  classical  planner  as  a  “black  box” 

There  have  been  several  other  works  that  proposed  to  use  classical  planning  algorithms  as  a  “black  box”  to  generate 
solutions  for  non-classical  planning  problems.  The  most  notable  one  is  FF-Replan  [42],  which  uses  the  FF  planner  [21]  to 
first  generate  a  plan  (i.e.,  a  weak  policy)  for  a  determinization  of  a  Markov  Decision  Process  (MDP).  Markov  Decision  Processes 
(MDPs)  are  like  nondeterministic  planning  domains  in  the  sense  that  each  action  can  have  more  than  one  possible  outcome, 
but  they  differ  from  the  latter  in  that  each  possible  outcome  of  an  action  has  a  probability  attached  to  it;  costs  and  rewards 
are  attached  to  the  actions  and  states,  respectively. 

FF-Replan  introduced  several  determinization  strategies  for  probabilistic  PDDL  actions;  among  which,  all-effects  deter¬ 
minization  is  the  basis  for  our  determinization  mechanism  in  NDP2.  While  FF-Replan  is  an  on-line  replanning  algorithm 
that  ensures  a  single  execution  to  be  realized  and  only  works  for  everywhere  solvable  planning  problems,  NDP2  generates, 
offline,  a  solution  for  all  possible  outcomes  of  the  nondeterminism  in  the  execution  and  it  can  deal  with  planning  problems 
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that  are  not  everywhere  solvable.  Both  approaches  have  different  advantages  and  disadvantages  as  discussed  in  the  litera¬ 
ture  several  times  previously.  NDP2  differs  from  FF-Replan  in  several  ways:  NDP2  can  use  any  classical  planning  algorithm, 
unmodified;  it  does  offline  generation  of  a  complete  solution  policy  rather  than  online  generation  of  a  single  execution 
trace;  and  it  finds  strong  cyclic  solutions  in  nondeterministic  domains. 

There  are  several  relatively  recent  MDP  planners  that  use  a  classical  planner  (typically  FF)  as  a  black-box.  An  example 
for  this  class  of  MDP  planners  include  RFF  [41].  Like  FF-Replan,  RFF  uses  FF  to  generate  weak  plans.  It  then  runs  several 
Monte-Carlo  simulations  to  determine  the  probability  of  the  execution  of  a  policy  tt  ending  in  a  non-goal  tt -result  of  an 
initial  state.  RFF  then  uses  FF  to  generate  weak  plans  from  those  states,  integrates  them  into  the  policy,  and  reruns  the 
Monte-Carlo  simulations.  RFF  repeats  this  process  until  the  probability  of  an  execution  failing  falls  below  a  fixed  parameter. 
Although  both  RFF  and  NDP2  can  handle  dead-ends  in  planning  domains  and  incrementally  build  the  policy,  NDP2  does 
so  by  explicitly  and  symbolically  reasoning  about  them;  RFF  does  so  by  reasoning  about  failure  probabilities.  Thus,  each 
planner  has  access  to  different  kinds  of  knowledge  and  models. 

FF-Hindsight  [43]  is  also  an  MDP  planner  inspired  by  FF-Replan  which  uses  FF  as  part  of  its  heuristic  and  generates 
weak  solutions  to  a  planning  problem.  For  each  state  evaluated,  FF-Hindsight  creates  sets  of  time-varying  classical  planning 
problems  and  uses  FF  to  find  which  portions  are  solvable.  This  is  an  optimistic  measure  of  how  likely  it  is  that  the  agent 
can  reach  the  goal  from  this  state.  FF-Hindsight  then  uses  the  solvability  estimate  to  pick  which  action  is  most  likely  to  lead 
to  the  goal.  Although  the  idea  of  using  heuristics  to  generate  execution  paths  in  FF-Hindsight  is  similar  to  that  of  FF-Replan 
and  NDP2,  FF-Hindsight  differs  from  NDP2  in  that  it  does  not  generate  strong  cyclic  solution  policies. 

There  are  also  other  approaches  for  planning  with  nondeterministic  actions  based  on  the  idea  of  classical  planners.  The 
work  described  in  1]  is  also  for  non-probabilistic  settings,  but  it’s  aimed  for  contingency  planning  in  partial-observable 
domains.  The  work  of  [34]  uses  determinizations  of  probabilistic  actions  and  use  classical  planners  to  generate  sequences  of 
actions  for  execution. 

FIP  [14]  is  a  recent  NDP-inspired  planner  which  shows  a  number  optimizations  that  can  be  done  if  the  classical  plan¬ 
ner  is  treated  not  as  a  black  box,  but  as  a  glass  box,  directly  incorporated  into  the  planner.  Optimizations  include  directly 
removing  state-action  pairs  from  the  domain  (eliminating  the  need  for  Find-Acceptable-Plan),  preferring  deterministic  oper¬ 
ators,  and  stopping  the  search  for  a  weak  plan  when  a  solved  state  is  found.  An  additional  optimization,  the  goal-alternative 
search,  would  be  easy  to  implement  in  NDP2,  but  it  requires  an  additional  call  to  Find-Acceptable-Plan,  which  is  already 
the  bottleneck  in  most  of  our  experiments.  Since  FIP  is  based  on  NDP,  it  has  NDP’s  plan  incorporation  bug  described  in 
Appendix  C,  but  it  should  be  straightforward  to  incorporate  our  fix  for  this  bug  into  FIP. 

Another  recent  work,  described  in  33],  has  made  incremental  extensions  to  some  of  the  ideas  in  NDP.  This  work 
introduces  a  definition  of  solution  quality,  and  PrP  looks  for  policies  that  are  optimal  according  to  that  definition.  An¬ 
other  difference  is  that  PrP’s  implementation  is  based  on  the  SAS+  formalism,  whereas  NDP2’s  implementation  uses  a 
non-probabilistic  version  of  PPDDL. 

The  planner  described  in  [22]  generates  cyclic  solutions  to  partially  observable  planning  problems  by  successively  pro¬ 
ducing  linear  plans  (i.e„  weak  policies)  and  combining  those  plans  into  a  conditional  and  cyclic  plan,  in  a  way  similar  to 
our  work,  However,  this  work  cannot  use  classical  planners  as  NDP2  does;  instead,  it  requires  substantially  rewriting  those 
planners  for  bookkeeping  for  policy  generation. 

The  GAMER  planner  [11]  translates  nondeterministic  problems  into  a  PDDL-like  language  for  describing  two-player  games 
and  uses  a  game  solver  to  find  a  solution.  GAMER  performed  well  in  ICAPS-08  planning  competition,  but  a  bug  in  its 
grounding  process  prevented  us  from  running  it  in  our  experiments. 

7.2.  Other  planning  techniques  for  nondeterministic  planning  domains 

Probably  the  first  work  on  planning  in  fully-observable  nondeterministic  domains  is  described  in  [15],  which  is  a  breadth- 
first  search  algorithm  over  an  AND-OR  tree.  Other  early  works  on  fully-observable  nondeterministic  domains  include  the 
Cassandra  planning  system  [39],  CNLP  [36],  Plinth  [18],  and  UCPOP  [35],  and  QBFPlan  [40],  However,  all  these  works  de¬ 
scribe  a  special-purpose  planning  algorithm  for  nondeterministic  planning  domains,  and  thus,  do  not  focus  on  using  classical 
planners  as  a  black  box. 

One  of  the  earliest  attempts  to  use  model-checking  techniques  for  planning  under  nondeterminism  was  first  introduced 
in  the  SimPlan  planner  of  [25],  SimPlan  is  based  on  model  checking  techniques  that  work  over  explicit  representations 
of  states  in  the  state  space;  i.e„  the  planner  represents  and  reasons  explicitly  about  every  state  visited  during  the  search. 
Symbolic  model-checking  approaches  to  planning  in  nondeterministic  domains  were  first  introduced  in  [17,9],  MBP  is  one 
of  the  best  planners  that  uses  Binary  Decision  Diagrams  (BDDs)  for  this  purpose. 

UMOP  [23,24]  exploits  some  of  the  ideas  from  the  MBP  planner,  as  a  starting  point  for  multi-agent  planning,  and  com¬ 
bines  BDDs  with  a  heuristic-search  algorithm  for  strong  and  strong  cyclic  planning  [24],  Heuristic  search  provides  some 
performance  improvements  over  unguided  BDD-based  planning,  such  as  in  MBP  on  some  simpler  examples  than  MBP  was 
tested  on.  We  have  not  compared  UMOP  to  NDP2  in  this  paper  because  of  this  reason;  the  authors  of  UMOP  discussed  and 
suggested  some  possibilities  for  scaling  their  approach  to  larger  problems. 

ND-SHOP2  [26]  uses  HTN  planning  techniques  to  control  the  search  space  in  nondeterministic  planning.  ND-SHOP2 
showed  how  HTN  knowledge  could  improve  nondeterministic  planning  performance,  and  performed  competitively  with 
MBP.  Yoyo  [29]  extended  this  line  of  work  by  combining  HTN  planning  with  a  compact  BDD  state  representation  to  get 
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several  orders  of  magnitutde  in  performance  gains  over  ND-SHOP2  and  MBP.  Both  of  these  planners  use  domain-specific 
planning  knowledge  to  organize  the  search  space  while  generating  solution  policies.  Unlike  them,  NDP2  relies  solely  on  the 
classical  planner’s  domain-independent  heuristic  search  capabilities. 

Planners  such  as  MBP  [4],  POND  [7]  and  Contingent-FF  [20]  can  generate  solution  policies  for  partially  observable  plan¬ 
ning  problems.  Most  of  them  cannot  generate  cyclic  solutions,  except  for  an  extended  version  of  MBP  [3],  which  can  generate 
strong  cyclic  solutions  to  a  class  of  partially  observable  problems.  We  believe  the  ideas  in  NDP2  could  also  be  generalized 
to  partial  observability. 

Finally,  [13]  reports  an  approach  for  analyzing  deterministic  planning  domains  and  identifying  structural  features  and 
dependencies  among  those  features  using  model-checking  techniques.  Although  this  approach  has  some  similarities  to  our 
pairwise  effect  abstraction  technique,  their  approach  focusses  on  using  the  results  of  a  domain  analysis  to  prune  the  search 
space  whereas  we  use  pairwise  effect  abstractions  for  state-space  compression.  It  would  be  interesting  to  investigate  as 
a  future  work  if  the  domain  analysis  method  can  be  used  for  identifying  more  general  and  effective  features  for  state 
compression. 

7.3.  A  final  note  on  MDPs 

MDP  problems  for  control  theory  and  operations  research  do  not  usually  include  a  notion  of  goal  states;  when  they  do, 
they  are  usually  formulated  as  stochastic shortest-path  (SSP)  problems.  See  [32]  for  an  excellent  survey  of  MDP  planning  and 
planning  techniques  from  an  AI  perspective. 

In  SSPs,  every  action  has  nonzero  probabilities  for  all  of  its  outcomes,  whence  the  probability  that  we’ll  never  leave 
the  cycle  is  zero.  Algorithms  for  solving  SSP  problems  attempt  to  compute  a  policy  that  will  achieve  the  goals  with 
probability  1  [31],  Note  also  that  this  property  is  analogous  to  the  “fairness”  assumption  in  strong-cyclic  solutions  in  non- 
deterministics  planning  domains  [9]  (and  as  also  defined  in  Section  2.1  in  this  paper). 

SSPs  can  be  solved  either  by  MDPs  or  by  nondeterministic  planning  models,  and  the  planners  using  the  latter  have  been 
shown  empirically  to  be  more  efficient  on  such  problems  [5],  The  primary  reason  is  that  planners  that  use  nondeterministic 
models  do  less  search  than  MDP  planners  because  they  are  not  looking  for  optimal  solutions. 

8.  Conclusions 

NDP2,  like  the  earlier  NDP  algorithm  [30],  solves  nondeterministic  planning  problems  by  calling  a  classical  planner  on 
a  sequence  of  deterministic  planning  problems,  and  using  the  classical  planner’s  plans  to  construct  a  strong  cyclic  solution 
policy  for  the  nondeterministic  problem.  However,  in  order  to  avoid  NDP’s  difficulties  with  unsoundness  and  combinatorial 
explosion  in  the  presence  of  unsolvable  states,  NDP2  has  a  different  (and  provably  correct)  way  of  dealing  with  unsolvable 
states. 

We  also  have  provided  algorithms  to  translate  a  planning  problem  P  into  two  different  “abstract”  versions  of  P  in  which 
there  are  states  that  represent  sets  of  P’s  states.  These  overcome  another  limitation  of  [30],  which  described  a  similar 
“conjunctive  abstraction”  technique  without  providing  an  algorithm  to  compute  it.  The  well-known  MBP  planner  uses  BDDs 
to  compute  abstractions  that  are  significantly  more  powerful  than  ours— but  since  our  abstractions  do  not  use  BDDs,  they 
preserve  NDP2’s  ability  to  be  used  with  any  classical  planner. 

NDP2’s  primary  advantage  over  MBP  is  that  MBP  uses  none  of  the  sophisticated  search  heuristics  used  in  classical 
planners,  hence  can  sometimes  visit  many  more  states  than  it  needs  to.  Since  NDP2  uses  a  classical  planner  as  a  subroutine, 
the  classical  planner’s  search  heuristics  can  sometimes  help  NDP2  to  visit  significantly  fewer  states  than  MBP.  This  happened 
in  the  Robot  Navigation  domain  and  the  Nondeterministic  Blocks  World,  where  NDP2  did  much  better  than  MBP. 

NDP2’s  main  disadvantage  compared  to  MBP  is  that  in  many  of  the  cases  where  MBP’s  BDDs  can  represent  a  set  of  states 
as  a  single  abstract  state,  our  abstraction  algorithms  cannot  do  so.  Thus  there  are  cases  where  NDP2  must  plan  for  different 
states  separately  but  MBP  can  plan  for  the  entire  set  of  states  at  once.  This  happened  in  our  Hunter-Prey  experiments, 
where  MBP  performed  much  better  than  NDP2. 

Our  experimental  results  with  Exploding  Blocks  World,  Tire  World,  and  Lost  in  Space  showed  that  NDP2’s  technique  for 
avoiding  unsolvable  states  works  quite  well:  NDP2  completed  nearly  every  problem  that  MBP  completed,  and  many  more 
that  MBP  could  not  complete.  In  the  Exploding  Blocks  World  and  Lost  in  Space  domain,  where  both  planners  completed 
enough  problems  to  compare  speed,  NDP2  completed  large  problems  much  faster  than  MBP. 

Future  work  Since  MBP's  BDD-based  abstractions  give  it  an  advantage  in  some  cases,  and  NDP2’s  access  to  classical  search 
heuristics  gives  it  an  advantage  in  other  cases,  it  might  be  possible  to  obtain  better  performance  than  both  MBP  and  NDP2 
by  writing  an  NDP2-like  planner  that  incorporates  an  FF-like  algorithm  operating  over  BDDs,  or  by  finding  other  ways  to 
combine  BDDs  and  relaxed  planning  graphs.  Existing  work  such  as  [7]  has  already  investigated  ways  to  combine  planning 
graphs  and  BDDs,  but  these  approaches  typically  require  complicated  and  potentially  exponential  representations  due  to  the 
mutex  conditions  in  planning  graphs,  which  degrade  the  abstraction  capabilities  in  BDDs.  Since  FF’s  relaxed  planning  graphs 
do  not  include  mutex  conditions,  they  might  be  a  better  fit  for  BDDs. 

Further  improvements  may  also  be  achievable  by  coupling  NDP2  and  FF  more  tightly.  When  NDP2  calls  FF,  it  must  wait 
until  FF  reaches  a  goal.  If  we  could  intervene  to  stop  FF  as  soon  as  it  reaches  a  state  that  is  already  part  of  NDP2’s  current 
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partial  solution,  this  would  provide  a  substantial  speedup  because  it  would  prevent  FF  from  wasting  time  retracing  large 
parts  of  the  solutions  that  it  found  during  the  previous  times  NDP2  called  it. 

We  note  that  some  MDP  planning  algorithms  (e.g.,  LAO*  [19])  can  generate  cyclic  solution  policies.  With  proper  modifi¬ 
cations  to  these  planners  and  their  inputs,  it  would  be  interesting  to  compare  them  with  NDP2  and  classical  planners.  This 
may  provide  a  path  toward  developing  an  NDP2-like  algorithm  for  MDPs. 
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Appendix  A.  Theoretical  properties 

This  appendix  provides  the  theoretical  properties  of  the  NDP2  planning  procedure,  its  subroutines  Find-Acceptable-Plan 
and  ConstrainProblem,  and  the  abstraction  and  compound  abstraction  techniques.  Most  of  the  lemmas  in  this  appendix  are 
not  mentioned  in  the  body  of  the  paper,  but  they  are  used  in  the  proofs  of  the  theorems. 

Lemma  1.  A  nondeterministic  planning  problem  P  =  (D,  So,  G)  is  everywhere  weakly  solvable  iff  it  is  everywhere  strong  cyclically 
solvable. 

Proof.  (=>■):  Let  so,...,s/(  be  the  set  of  states  reachable  from  so.  Since  P  is  everywhere  weakly  solvable,  let  po.pi, _ Pi< 

be  a  set  of  weak  solutions  for  each  s ;. 

Let  7to  be  the  policy  formed  by  setting  7to(s)  —  a  for  every  (s,  a)  e  po-  By  construction,  if  jto(s)  is  defined,  there  is  a  goal 
jto -descendant  of  s. 

Let  TTj+i  be  the  policy  formed  by  setting  7rI+i(s)  =  7Tj(s)  for  every  state  s  such  that  tz\  is  defined,  and  7r,+i  (s)  =  a  for 
every  state  such  that  7r,(s)  is  not  defined  and  (s,  a)  e  Pi+\.  Again,  by  construction,  every  state  for  which  7rI+i  is  defined  has 
a  goal  tt,+i  -descendant. 

Since  jr^-descendants  of  sq  are  a  subset  of  the  states  reachable  from  so,  every  7T/<  descendant  of  so  has  a  path  to  the 
goal,  and  jt/(  is  a  strong  cyclic  solution  to  P. 

(<=):  Suppose  P  is  everywhere  strong  cyclicly  solvable  and  let  s  be  a  state  in  P  reachable  from  So.  If  P  were  not 
everywhere  weakly  solvable,  then  there  would  exist  at  least  one  state  s  that  is  reachable  from  s  but  there  is  no  path  from 
s  to  a  goal.  But  this  is  a  contradiction  by  definition  of  strong  cyclic  solutions.  □ 

Lemma  2.  For  every  state  s  in  a  nondeterministic  planning  problem  P,  s  is  weakly  solvable  in  P  if  and  only  if  it  is  solvable  in  P. 

Proof.  Let  P  =  (D,  {so},  G)  be  a  nondeterministic  planning  problem,  and  P  be  a  determinization  of  P. 

(=^):  Let  s  be  a  state  in  D  and  suppose  (D,{s},G)  has  a  weak  solution  it  =  {(Sj_i,  a,)}"=1  where  so,...,sn  is  the 

sequence  of  states  produced  by  tc  in  D.  Let  p  =  (a\ . a'  1 )  be  a  plan  such  that  each  aj  is  a  determinization  of  a,  and 

j/(Sj_i,  Qj)  =  {Sj}  since  tt  is  a  weak  solution  for  (D,  {s},  G).  By  construction,  the  plan  p  is  a  solution  for  the  classical  planning 
problem  (D,  s,  G). 

(<=):  Suppose  P  has  a  solution  plan  p  =  (ai, . . . ,  an).  It  follows  that  from  the  way  determinizations  are  constructed,  n  is 
a  weak  solution  for  P:  if  a,  is  applied  in  the  state  Sj_i  in  p,  then  (s,_i,a-)  e  it  such  that  a,  e  a-.  □ 

Lemma  3.  Let  CP  be  a  sound  classical  planner  that  is  guaranteed  to  terminate.  Then  Find-Acceptable-Plan  returns  in  at  most 
|S|  •  |A|  +  1  calls  to  CP,  where  S  and  A  are  the  set  of  states  and  actions  in  the  classical  domain,  respectively. 

Proof.  To  prove  the  bounds  in  the  lemma,  we  need  to  show  that  after  every  call  to  CP,  if  Find-Acceptable-Plan  did  not  exit 
then  it  adds  a  new  state-action  pair  to  B.  From  this  it  follows  that  since  there  is  only  a  finite  number  of  states  and  actions, 
Find-Acceptable-Plan  must  eventually  return. 

Note  that  the  only  time  a  state  is  removed  from  S  is  when  CP  returns  failure,  after  which  Find-Acceptable-Plan  adds  the 
state  to  I<  (Line  21).  This  means  that  once  NDP2  adds  a  state  to  S,  it  either  stays  in  S  or  is  moved  to  K.  Since  Line  12 
forbids  adding  a  state  to  S  which  is  already  in  either  S  or  K,  every  state  in  S  is  unique,  and  we  never  add  a  state  to  S  more 
than  once. 

Now  we  need  to  show  that  after  every  call  to  CP,  Find-Acceptable-Plan  either  returns  success  or  failure,  or  adds  a  new 
state-action  pair  to  B.  Look  at  what  happens  when  CP  returns  a  plan  (ao,...,a„)  from  the  current  state  s  to  the  goal, 
going  through  states  (si, . . . ,  s„+i).  Since  CP  is  sound,  sn+i  is  a  goal  state.  If  Find-Acceptable-Plan  accepts  the  whole  plan, 
Find-Acceptable-Plan  will  return  success  on  the  next  iteration. 
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Suppose  Find-Acceptable-Plan  rejects  the  first  action  oq.  We  know  (s,  ao)  £  B,  since  ConstrainProblem  prevents  those 
actions  from  being  applicable  as  the  first  action  in  the  plan.  So  if  Line  12  rejects  a o,  then  (s,a o)  is  a  new  pair  added  to  B. 
Otherwise,  a o  is  added  to  the  current  plan,  and  the  current  state  is  set  to  si . 

Note  that  everywhere  Find-Acceptable-Plan  adds  state-action  pairs  to  B,  the  state  part  of  the  pair  is  the  last  state  in  S. 
So  suppose  Find-Acceptable-Plan  accepts  actions  ao...aj,  and  rejects  action  a,+i .  Since  Find-Acceptable-Plan  only  accepts 

actions  that  lead  to  states  never  before  in  S,  the  state-action  pairs  (si .  a\), _ (s,- ,  a,-),  (s,+i ,  a,+i )  $  B.  And  so  (s!+i,  aj+i)  is 

a  new  state-action  pair  added  to  B. 

Now  look  at  what  happens  when  CP  returns  failure.  Find-Acceptable-Plan  removes  the  last  action  from  the  plan  (a'),  and, 
if  it  does  not  return  failure,  adds  the  pair  (s',  a')  to  B,  where  s'  is  the  previous  state  in  the  current  plan.  From  above,  we 
know  that  when  we  added  a'  to  the  plan  that  (s',  a')  B.  Furthermore,  since  Find-Acceptable-Plan  added  a'  to  p,  s'  hasn’t 
been  the  last  state  in  S  until  now,  so  (s',  a')  is  still  not  in  B.  So  (s',  a')  is  a  new  pair  added  to  B.  □ 

Having  shown  termination,  we  can  now  show  that  Find-Acceptable-Plan  returns  failure  or  returns  an  acyclic  plan  whose 
policy  image  avoids  the  states  in  U.  As  shorthand,  we  call  these  plans  [/-acceptable.  More  formally,  a  plan  p  is  U -acceptable 
in  a  state  s  with  respect  to  a  nondeterministic  domain  D,  its  determinization  D,  a  classical  planning  problem  (D,s,  G )  and 
a  set  of  states  U  if: 

•  p  is  applicable  in  the  state  s  in  the  classical  planning  domain  D. 

•  It  is  acyclic  (no  repeated  states). 

•  For  every  state-action  pair  (Sj.dj)  in  the  image  of  p  on  (D,s,  G),  let  a;  be  the  corresponding  nondeterministic  action 
in  D.  Then  yd(si,  a,-)  n  U  =  0. 


Lemma  4.  If  CP  is  sound  and  guaranteed  to  terminate,  then  Find-Acceptable-Plan  is  sound  and  it  returns  either  a  failure  or  a 
U -acceptable  plan  that  ends  in  a  goal  state. 

Proof.  Since  Find-Acceptable-Plan  checks  if  the  last  state  in  its  partial  plan  p  reaches  the  goal  before  returning  a  plan,  it  is 
enough  to  show  that  the  partial  plan  p  in  the  procedure  is  always  a  (/-acceptable  plan. 

Since  any  prefix  of  a  (/-acceptable  plan  is  still  (/-acceptable,  we  can  do  induction  on  the  size  of  p  in  Find-Acceptable-Plan, 
looking  only  at  additions  to  p.  In  the  base  case,  p  is  empty,  meeting  the  (/-acceptable  requirements  trivially. 

By  the  inductive  hypothesis,  assume  p  is  (/-acceptable  and  Find-Acceptable-Plan  is  adding  an  action  a  to  p.  Since  the 
only  location  for  this  is  in  Line  12,  Find-Acceptable-Plan  has  already  checked  in  Line  12  that  it  does  not  form  a  loop  in  p, 
and  that  its  corresponding  action  in  D  does  not  lead  to  any  state  in  (/.  If  CP  is  sound,  then  a  is  applicable  in  s.  So  a 
appended  to  p  is  a  (/-acceptable  plan.  □ 

Before  we  can  show  the  completeness  of  Find-Acceptable-Plan,  we  need  a  utility  lemma  that  says  we  can  take  a 
(/-acceptable  plans  from  states  a  to  b  and  b  to  c  to  produce  a  (/-acceptable  plan  from  a  to  c.  Note  that  the  concatena¬ 
tion  of  any  two  (/-acceptable  plans  may  not  be  (/-acceptable,  since  the  resultant  plan  may  visit  some  states  twice. 

Lemma  5.  With  a  nondeterministic  domain  D,  its  determinization  D,  a  classical  planning  problem  (D,  so,  G)  and  a  set  of  states  U,  let 
p  be  a  U -acceptable  plan  from  so  to  some  state  s i  and  p'  be  a  U -acceptable  plan  from  s i  to  some  state  s 2.  Let  S  be  a  directed  graph 
where  the  nodes  are  the  states  associated  with  p  and  p'  and  the  edges  are  the  actions  from  p  and  p' . 

Then  any  acyclic  path  from  so  to  s 2  in  S  corresponds  to  a  U  -acceptable  plan  from  so  to  S2- 

Proof.  Let  p"  be  any  acyclic  path  through  S.  Then  p"  is  (/-acceptable  since  p"  only  goes  through  state  transitions  appearing 
in  p  applied  at  so  or  p'  applied  at  si,  p"  is  by  definition  acyclic,  and  no  action  in  S  leads  to  a  state  in  U.  □ 

Now  we  can  show  that  Find-Acceptable-Plan  is  not  only  sound  and  guaranteed  to  terminate,  but  also  complete: 

Lemma  6.  If  CP  is  sound  and  complete,  Find-Acceptable-Plan  is  complete. 

Proof.  To  show  that  Find-Acceptable-Plan  is  complete,  it  is  enough  to  show  that  Find-Acceptable-Plan  never  backtracks  from 
a  state  along  a  U -acceptable  path  to  a  goal.  We  show  this  by  contradiction. 

Suppose  Find-Acceptable-Plan  is  backtracking  for  the  first  time  from  a  partial  plan  p  with  associated  sates  so, . . . ,  s  from 
which  there  is  a  (/-acceptable  plan  to  the  goal.  Since  p  is  (/-acceptable,  there  is  an  action  a  applicable  in  s,  the  last  state 
of  p,  which  is  along  (/-acceptable  path  to  the  goal.  Since  CP  is  complete,  Find-Acceptable-Plan  added  the  transition  (s,a)  to 
B  sometime  before  hitting  Line  21. 

Now  we  reason  about  how  and  when  (s,a)  appeared  in  B.  There  are  two  locations  in  Find-Acceptable-Plan  where  B  is 
modified.  Either  (s,  a)  must  have  been  added  via  Line  13  or  Line  21.  We  now  show  contradictions  in  four  cases: 
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1.  (s,  a)  was  added  to  B  with  a  partial  plan  that  was  a  strict  prefix  of  p. 

2.  (s,  a)  was  added  to  B  with  the  partial  plan  p. 

3.  (s,  a)  was  added  to  B  with  a  partial  plan  p'  where  p  is  a  strict  prefix  of  p'. 

4.  (s,  a)  was  added  to  B  with  a  partial  plan  p'  where  neither  p  nor  p'  is  a  prefix  of  the  other. 

Case  1.  (s,  a)  was  added  to  B  with  a  partial  plan  that  was  a  strict  prefix  of  p.  Since,  by  the  soundness  of  Find-Acceptable- 

Plan  (Lemma  4),  p  is  irredundant,  a  proper  prefix  will  not  go  through  s,  which  is  a  necessary  condition  to  hit  both 

Line  13  and  Line  21. 

Case  2.  ( s,a )  was  added  to  B  with  the  partial  plan  p.  For  Line  13,  since  a  appended  to  p  is  part  of  a  l/-acceptable  path,  it 
will  not  create  a  cycle,  and  its  policy  image  contains  no  states  in  Li,  so  the  conditional  on  Line  12  would  prevent 
Find-Acceptable-Plan  from  reaching  that  line.  For  Line  21,  this  would  mean  that  the  state  s  immediately  precedes  itself 
in  p,  which  violates  the  soundness  lemma  for  Find-Acceptable-Plan. 

Case  3.  (s,a)  was  added  to  B  with  a  partial  plan  p'  where  p  is  a  strict  prefix  of  p'.  Say  (s,a)  was  added  to  B  while  Find- 
Acceptable-Plan  had  a  partial  plan  p'  with  a  prefix  of  p  that  goes  through  states  So, . . . ,  s, . . . ,  s'.  To  ban  (s,  a)  in 
Line  13,  CP  must  have  produced  a  plan  that  goes  from  s'  through  s  and  then  some  s"  —  y^(s,a)  which  forms  a  cycle. 
But  since  s  is  already  in  S,  Line  12  would  have  stopped  integrating  the  plan  when  it  hit  the  action  that  lead  to  s. 

To  ban  (s,a)  in  Line  21,  we  would  have  to  be  planning  from  a  state  directly  following  s  with  the  previous  action  a. 

By  the  soundness  lemma  the  current  plan  is  irredundant,  so  p'  must  be  a  appended  to  p.  Since  this  is  also  along  a 
Li -acceptable  path  to  the  goal,  it  violates  our  assumption  that  the  first  backtrack  from  a  Li-acceptable  path  happened 
with  a  current  plan  of  p. 

Case  4.  (s,  a)  was  added  to  B  with  a  partial  plan  p'  where  neither  p  nor  p'  is  a  prefix  of  the  other.  In  order  to  place  (s,  a)  in  B,  p' 
must  either  terminate  at  s  for  Line  13  or  terminate  at  sa  =  Yjj(s,  a)  for  Line  21.  In  either  case,  p'  is  Li -acceptable.  Since 
p  is  Li-acceptable  and  s  is  along  a  Li -acceptable  path  to  the  goal,  then  there  is  a  plan  pg  from  s  to  the  goal  such  that 
p  adjoined  pg  is  a  Li-acceptable  plan.  Since  p'  is  Li -acceptable,  by  Lemma  5,  one  can  construct  a  Li -acceptable  plan 
ps  from  just  the  states  and  actions  in  p'  and  pg.  Since  p  adjoined  pg  must  be  irredundant,  pg  must  not  lead  to  any 
state  in  p'  that  also  appears  in  p.  Let  Sd  be  the  first  state  along  p'  that  differs  from  the  states  along  p  or  the  last  state 
along  p'  if  no  states  differ.  Since  p'  is  irredundant  and  pg  references  no  states  in  p'  before  Sd,  ps  must  go  through  s^. 
However,  this  means  that  Find-Acceptable-Plan  must  have  backtracked  past  Sd  when  there  was  a  Li -acceptable  plan  to 
the  goal  from  that  point.  So  the  backtracking  at  s  is  not  the  first  time,  which  contradicts  our  assumption. 

Therefore,  since  any  backtracking  along  a  Li-acceptable  path  to  the  goal  causes  a  contradiction,  Find-Acceptable-Plan  is 
complete.  □ 

Theorem  1.  Let  CP  be  a  sound  and  complete  classical  planner,  U  be  a  set  of  states,  D  be  a  nondeterministic  planning  domain,  and 
D  —  (C,  0)  be  the  determinization  of  D.  Let  S  be  the  set  of  all  states  in  C,  (i.e.,  S  —  2F ),  F  =  {all  ground  atoms  over  C],  and  A  be  the 
set  of  all  possible  actions  (i.e.,  all  possible  ground  instantiations  of  the  planning  operators  in  0  ). 

Then  Find-Acceptable-Plan(D,  D,  so,  G,  CP,  U)  makes  at  most  |S|  -  |  A|  +  1  calls  to  CP,  and  returns  an  acyclic  plan,  if  such  a  plan 
exists,  whose  policy  image  in  D  avoids  the  states  in  Li. 

Proof.  Immediately  follows  from  Lemmas  3,  4,  and  6.  □ 

Lemma  7.  If  CP  is  sound  and  guaranteed  to  terminate,  then  NDP2  returns  in  at  most  |S|2  calls  to  Find-Acceptable-Plan,  where  S  is  the 
set  of  states  in  the  domain. 

Proof.  By  Lemma  4,  we  have  that  Find-Acceptable-Plan  is  sound  and  terminates.  Every  iteration  of  NDP2  selects  s,  a  non¬ 
goal  7T -result  of  So,  and  either  produces  a  weak  plan  from  that  state  to  a  goal  state,  or  fails  to  find  a  plan,  and  adds  s 
to  Li. 

If  NDP2  found  a  plan  for  s,  it  will  not  be  a  non-goal  7r -result  of  So  again  unless  NDP2  adds  a  child  of  s  to  Li.  Since  there 
are  finitely  many  states,  there  can  be  at  most  |S|  many  iterations  of  the  main  planning  loop  before  NDP2  either  returns  or 
adds  a  state  to  Li. 

Once  a  state  is  in  Li,  since  Find-Acceptable-Plan  is  sound,  no  action  added  to  the  policy  will  lead  to  that  state.  So  again, 
NDP2  can  only  add  at  most  |S|  states  to  Li  before  there  is  no  path  from  any  leaf  state  to  a  goal  that  does  not  lead  to  a  state 
in  Li. 

With  at  most  |S|  iterations  between  adding  a  state  to  Li  and  at  most  |S|  additions  to  Li,  NDP2  must  return  in  at  most 
|S|2  calls  to  Find-Acceptable-Plan.  □ 

As  written,  this  means  that  NDP2  will  make  0(|S|3  ■  |A|)  calls  to  CP.  Notice,  however,  that  we  only  add  states  to  Li.  This 
means  that  in  Find-Acceptable-Plan,  we  can  cache  B  and  K  per  starting  state  (caching  p  will  not  be  helpful).  This  means 
that  Find-Acceptable-Plan  will  only  call  CP  0(|S|  ■  |A|)  times  per  starting  state,  which  reduces  NDP2’s  number  of  calls  to  CP 
to  0(|S|2-|A|). 
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Lemma  8.  If  CP  is  sound,  then  after  each  iteration  o/NDP2,  there  are  no  inescapable  cycles  in  n .  That  is,  every  jt -descendant  state  of 
the  initial  state  has  a  path  to  a  it -result  of  the  initial  state. 

Proof.  The  proof  is  by  induction  on  the  changes  to  : r. 

When  jz  is  empty,  the  initial  state  is  a  jz -result  of  itself,  so  the  lemma  is  trivially  true.  For  the  induction  step,  there  are 
two  ways  NDP2  can  change  tz  : 

1.  Merging  a  plan  from  Find-Acceptable-Plan  to  tz  (Line  11). 

2.  Find-Acceptable-Plan  returns  failure  (Line  19). 

Case  1.  Merging  a  plan  from  Find-Acceptable-Plan  to  tz  .  NDP2  will  merge  the  plan  until  it  reaches  the  end  of  the  plan,  which 
by  Lemma  4  is  a  goal  state,  or  reaches  a  state  in  tz  which  has  a  goal  state  tz -descendant.  In  either  case,  all  modified 
states  in  tz  and  any  state  that  had  modified  states  as  tz  -results  now  have  a  path  to  a  goal  tz -result  of  So. 

Notice  that  if  that  if  NDP2  did  not  change  the  action  for  states  already  in  the  policy,  that  when  NDP2  integrated  a 
plan  that  went  from  a  state  s  through  a  tz  -ancestor  s'  of  s,  it  could  create  an  inescapable  cycle. 

Case  2.  Find-Acceptable-Plan  returns  failure.  When  Find-Acceptable-Plan  returns  failure  on  a  state  s,  actions  which  lead  to 
s  are  removed  from  tz.  So  any  state  which  claimed  s  as  a  ;r -descendant  can  now  claim  one  of  s’s  parents  as  a 
tz -result.  □ 

Lemma  9.  If  CP  is  sound,  NDP2  is  sound. 

Proof.  If  NDP2  returns  a  policy,  by  the  previous  lemma  all  tz -descendants  of  the  initial  state  have  a  path  to  a  non-goal 
tz -result  or  a  goal  state.  Since  NDP2  terminated  without  failure,  there  are  no  more  non-goal  tz -results  in  the  policy,  so  all 
states  have  a  path  to  a  goal  state,  and  tz  is  a  valid  strong  cyclic  plan.  □ 

Lemma  10.  If  CP  is  sound  and  complete,  at  every  point  in  the  execution  o/NDP2  on  a  nondeterministic  problem  P  =  (D,  So,  G),  the 
set  U  is  a  subset  of  all  unsolvable  states.  Thus  any  state  in  U  cannot  appear  in  any  strong-cyclic  solution  policy  for  P. 

Proof.  The  proof  is  by  induction  on  the  size  of  U. 

Let  s  be  the  first  state  added  to  Li,  which  means  Find-Acceptable-Plan  returned  failure  when  planning  from  s.  Since  U  is 
empty  and  by  Lemma  6  Find-Acceptable-Plan  is  complete,  there  is  no  path  to  a  goal  state  from  s,  and  so  s  would  not  be  a 
;r -descendant  of  so  in  any  valid  strong  cyclic  policy. 

Induct.  Assume  Li  contains  only  states  which  may  not  appear  in  any  strong  cyclic  policy.  Let  s  be  the  next  state  added 
to  Li,  which  means  Find-Acceptable-Plan  returned  failure  when  planning  from  that  state.  Since  Find-Acceptable-Plan  is 
complete,  all  possible  paths  from  s  to  a  goal  state  also  lead  to  a  state  in  Li,  and  thus  s  must  also  not  appear  in  any  valid 
policy.  □ 

Lemma  11.  If  CP  is  sound  and  complete,  NDP2  is  complete. 

Proof.  Proof  by  contradiction.  Assume  NDP2  is  not  complete.  Then  there  is  a  domain  D,  initial  states  So,  goal  set  G,  and 
strong  cyclic  policy  tz  such  that  NDP2(D,  So,  G,  CP)  returns  failure,  even  though  tz  is  a  valid  strong  cyclic  policy. 

This  means  Find-Acceptable-Plan  returned  failure  from  an  initial  state,  and  so  there  is  no  path  to  the  goal  which  also 
doesn’t  lead  to  a  state  in  U.  However,  tz  has  paths  from  each  of  the  initial  states  to  the  goal,  and  so  some  action  along  each 
of  those  paths  must  lead  to  a  state  in  U.  This  is  a  contradiction  with  the  above  lemma,  that  U  will  never  contain  states  that 
appear  in  any  strong-cyclic  solution.  □ 

Theorem  2.  Let  CP  be  a  sound  and  complete  classical  planner  and  P  —  (D,  So,  G)  be  a  nondeterministic  planning  problem  with 
D  =  (£,  0).  Let  S  be  the  set  of  all  states  in  C,  (i.e.,  S  =  2F ),  F  =  {all  ground  atoms  over  C],  and  A  be  the  set  of  all  possible  actions  (i.e., 
all  possible  ground  instantiations  of  the  planning  operators  in  0  ). 

Then  NDP2(D,  So,  G,  CP)  is  sound  and  complete,  and  returns  at  most  in  |S|2  calls  to  Find-Acceptable-Plan. 

Proof.  Immediately  follows  from  Lemmas  7,  9,  and  11.  □ 

Appendix  B.  Extracting  solutions  from  abstract  and  compound-abstract  problems 

Given  a  problem  P  and  it’s  abstraction  P*,  let  s*  be  an  abstraction  of  a  state  s,  let  a*  from  P*  be  an  action  such 
that  yp*(s*,  a*)  —  {si*, . . . ,  s„*,  s^, .. . ,  s'.},  where  states  s\,...,s'j  are  non-abstract,  and  let  a  be  the  action  in  P  whose 

abstraction  is  a*  where  )/p(s,a)  =  (si . sm,  s'j, . . . ,  s'-}  (n  <  m).  Then  for  each  s,-  (i  =  l,...,m),  there  is  a  merge  action 

merge({. . .}),-  such  that  the  policy  {(s,  a),  (Si,  merge({. .  .))i), . . . ,  (sm,  merge({. .  .})m)}  has  exactly  the  same  tz  results  as  the 
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Algorithm  7:  Algorithm  to  map  an  abstract  policy  tc* *  into  a  policy  that  isn’t  abstract. 

1  Procedure  unabstractl  P  =  UC.  0),  S0,  G),  P*,  7r*) 

2  while  3(s,  a*)  err*  where  s  is  not  abstract  and  a'  f  0  tlo 

3  Let  s*  be  the  ^’-corresponding  state  of  s 

4  Let  {(s,  a),  (si ,  merge({. .  .))i), . . . ,  (s„,  merge({. .  .})„)}  be  the  unabstracted  image  of  (s’,  a*)  in  s 

5  TV*  <-  (n*  \{(s,7r*(s))))U{(s,a)) 

6  it"  <-  7r*  U  { (s, ,  merge({. .  .}),•>  1 7r*(Si)  is  not  defined) 

7  if  s  has  no  goal  iv*-descedants  then 

8  Pick  s'  e  y(s,  a*)  such  that  s'  has  a  path  to  the  goal 

9  Let  s"  be  the  first  state  on  the  path  from  s'  to  the  goal  such  that  it*(s")  is  not  a  merged. . .))  or  split-p  operator 

10  Pick  Sj  e  y(s,  a)  such  that  s"  is  an  abstraction  of  sj 

11  tv*  <-  (it*  \  {(sj,  7T*(Sj))))  U  { (s j ,  merged-  •  ■));)} 

12  Remove  any  non-7r*-descendants  of  So 

13  return  ; r* 


Algorithm  8:  Algorithm  to  map  a  compound-abstract  policy  into  an  abstract  policy  that  isn’t  compound. 

1  Procedure  Map-to-Uncompound(;r”,  0”,  £) 

2  while  7r”  contains  any  compound  actions  do 

3  foreach  (s,  a)  e  jr”  do 

4  if  a  has  the  form  split-p(.  . .)  •  o(. . .)  where  split-p  e  £  and  o  e  0”  then 

5  remove  (s.a)  from  it** 

6  it**  <-7r”U{(s, split-p (. ..)) 

7  s'  <  {s  \  {p’(. . .)}}  U  p(. . .) 

8  if  n  **  (s' )  has  no  goal  tv** -descendant  then 

9  remove  (s' ,  71** (s'))  from  jr” 

10  it**  7r”  U  {(s',  o(.  . .)} 

11  return  jr” 


policy  { (s* ,  a* ) } .  We  call  this  policy  the  unabstracted  image  of  (s*,a*)  in  s.  We  skip  the  details  of  how  to  find  merge({. .  .}),• 
from  Si,  since  this  is  just  a  variant  of  the  maximal  abstraction  algorithm. 

For  any  non-goal  non-abstract  state  s,  we  define  the  n-path  corresponding  to  state  s  recursively  as  a  sequence  of  states, 
starting  with  s.  If  s'  is  on  the  jr-path  corresponding  to  s,  then: 

•  if  n(s')  is  not  a  split-p  or  merge({})  action,  then  s'  is  the  last  state  on  the  ;r-path  corresponding  to  s. 

•  If  n(s')  is  the  action  merge({. . .}),  then  y(sr,  merge({. . .}))  is  the  next  state  in  the  7r-path  corresponding  to  s. 

•  If  Tt(s')  is  the  action  split-p(. ..)  and  y(s',  split-p(. . .))  =  {si,  S2},  then  only  one  of  si  or  S2  is  consistent  with  s,  and  that 
state  is  on  the  jr-path  corresponding  to  s. 

If  the  7T-path  corresponding  to  s  is  finite  (it  does  not  loop  forever),  then  the  last  state  s'  is  the  1 1  -corresponding  state  to  s, 
and  the  action  tc  (s')  is  executable  in  state  s. 

Given  a  solution  7 r*  for  P*.  Algorithm  7  can  extract  a  solution  for  P.  It  loops  over  the  solution,  picking  a  non-abstract 
state  s  where  it  assigns  an  action  with  abstract  effects.  After  the  first  iteration,  this  may  include  states  where  the  current 
policy  assigns  a  merge({...})  action. 

Algorithm  7  then  finds  the  ji -corresponding  state  s*  (which  is  potentially  equal  to  s)  and  replaces  1 r(s)  with  tt(s*). 
Algorithm  7  then,  for  every  child  s,  of  s  for  which  it(Si)  is  undefined,  adds  the  corresponding  action  from  the  unabstracted 
image  of  (s*,a*). 

The  policy  it  should  be  a  strong  cyclic  solution  to  a  partially  unabstracted  P*  after  every  iteration  of  the  main  loop.  So 
if  after  merging  the  unabstracted  image  of  (s*,a*),  s  no  longer  has  a  path  to  the  goal,  then  Algorithm  7  picks  a  state  in 
the  children  of  s  replaces  the  policy  for  it  with  a  merge({...})  action  pointing  to  one  of  the  former  children  of  s  (one  in 
y  (s,  □*)).  Algorithm  7  terminates  when  no  abstracted  actions  are  left  in  the  solution. 

Let  P**  be  a  compound-abstract  version  of  P*,  and  it**  be  a  solution  for  P**.  Algorithm  8  is  an  algorithm  to  extract  an 
abstract  plan  tt*  from  jt**.  It  works  by  iterating  over  the  state-action  pairs  in  1 t** ,  modifying  them  to  translate  each  com¬ 
pound  action  split-p(. . .)  ■  o**(. . .)  into  its  components  split-p(. . .)  and  o**(. . .).  With  each  pass  of  the  main  loop,  Algorithm  8 
reduces  the  maximum  amount  of  compounding.  It  terminates  when  none  of  the  actions  in  it**  is  compound. 

Appendix  C.  Unsoundness  of  NDP 

Algorithm  9  is  the  NDP  algorithm  from  [30].  Unlike  NDP2,  NDP  incorporates  a  plan  by  first  converting  it  to  a  policy 
(presumably  by  first  removing  any  cycles  from  the  from  the  plan),  and  then  incorporating  any  state-action  pairs  for  any 
state  where  the  current  policy  is  undefined.  Here  we  show  by  example  that  this  is  unsound.  NDP,  if  used  in  the  presence 
of  unsolvable  states,  may  return  policies  which  are  not  strong  cyclic  solutions. 
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Algorithm  9:  The  NDP  algorithm  from  [30],  which  is  unsound  in  domains  which  have  unsolvable  states. 

1  Procedure  NDP(D,  So,  G,  CP) 

2  7T  0 

3  D  <-  a  determinization  of  D;  U  <-0 

4  loop 

5  S  «-  {all  non-goal  tt -results  of  So} 

6  if  S  =  0  then  return  tt 

7  arbitrarily  select  a  state  s  e  S 

8  call  CP(D,  s,  G ) 

9  if  CP  returns  a  solution  plan  p  then 

10  Let  tt'  be  the  policy  image  of  p  applied  at  s 

11  7T  < —  tt  U  {(s,  a)  g  tt'  |  tt (s)  is  not  defined} 

12  else 

//  CP  returned  Failure 

13  if  s  e  So  then  return  Failure 

14  foreach  s'  such  that  s  e  y(s',  tt(s'))  do 

15  modify  D  to  make  the  determinizations  of  7T(s')  inapplicable  at  s’ 

16  TT  <—  TT  \  { (S/ ,  TT  (S7))} 


Fig.  C.15.  A  classical  solution  to  (D,so,sg)  and  its  incorporation  into  the  empty  policy. 


Fig. C.16.  A  classical  solution  to  (D,si,sg)  that  avoids  S3,  and  the  resulting  policy  produced  by  NDP. 

Example.  Consider  the  nondeterministic  planning  problem  P  =  (D,  So,  G)  where  So  =  {so]  and  G  =  {sg}.  Fig.  C.14  illustrates 
the  nondeterministic  planning  problem  P.  In  the  figure,  D  is  the  determinization  of  D. 

In  NDP’s  first  iteration,  let  CP  return  the  shortest  classical  solution:  (00,031)  (Fig.  C.15).  After  NDP  incorporates  the  plan 
into  the  current  policy  tt,  there  will  be  one  non-goal  7r -result  of  So,  S3.  On  the  next  iteration,  NDP  will  select  S3,  and  call 
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CP  on  the  problem  (D,s 3,  G).  Since  there  are  no  solutions  to  that  problem,  NDP  will  remove  (si,a3)  from  n  and  will  make 
031  and  032  inapplicable  at  si  in  D. 

Now  si  is  the  only  non-goal  71  -result  of  So.  NDP  will  now  call  CP  on  the  classical  planning  problem  (D,Si,G).  CP  will 
return  the  plan  (01,02,04,05,06).  NDP  will  incorporate  just  the  first  two  actions  from  the  plan,  but  it  will  not  incorporate 
04,  since  so  already  has  an  action,  i.e„  o 0,  in  the  current  policy.  This  leaves  us  with  the  policy  shown  on  the  right  in  Fig.  C.16, 
where  there  is  an  inescapable  loop  between  so,  Si,  and  S2.  There  are  now  no  non-goal  n -results  of  So,  and  so  NDP  will  exit 
on  the  next  iteration  of  the  loop,  returning  the  policy  found  in  Fig.  C.16  which  is  not  a  solution  to  the  original  problem.  □ 

So  by  never  changing  the  action  already  associated  with  a  state,  NDP  can  create  loops  where  states  have  no  path  to 
the  goal.  This  violates  one  of  the  invariants  that  makes  NDP2  work,  which  is  that  after  every  iteration,  every  state  in  the 
execution  structure  has  a  path  to  a  goal  or  leaf  state.  This  invariant  is  made  explicit  and  proven  in  Lemma  8  in  Appendix  A. 
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